10.0-RC1, armv6: "pfctl -s state" crashes on BeagleBone Black due to unaligned access

Asked by 7 months ago
Hi, I am running 10.0-RC1 arm.armv6 on the BeagleBone Black. The "pfctl -s state" command is crashing when trying to print the second entry. struct pfsync_state has a size that is not divisiable by 4 or 8 leading to the second entry in the returned state array not being aligned and pfctl core dumps on Bus error when trying to access a uint32_t field. (gdb) bt #0 print_host (addr=0x2085a11a, port=7660, af=2 '\002', opts=1024) at /usr/src/sbin/pfctl/pf_print_state.c:178 #1 0x00021c4c in print_state (s=0x2085a0f2, opts=1024) at /usr/src/sbin/pfctl/pf_print_state.c:236 #2 0x0000c664 in pfctl_show_states (dev=<value optimized out>, iface=0x0, opts=1024) at /usr/src/sbin/pfctl/pfctl.c:1095 sizeof(struct pfsync_state_key) is 36 sizeof(struct pfsync_state_peer) is 32 sizeof(struct pf_addr) is 16 sizeof(struct pfsync_state) is 242 Removing the __spare[2] field will allow the struct to be aligned on 8 bytes for the u_int64_t id field and also cover the uint32_t fields alignment but this will break KBI. I am currently using an inefficient workaround in pfctl_show_states that memcpy each entry to a struct pfsync_state on the stack ensuring each call to print_state receives an aligned struct. 10.0-RC1 World and kernel were compiled in a VirtualBox VM running 9.2-RELEASE-p2 i386. clang and ARM_EABI used as the default make options. Regards, Guy

Your Answer

Name:
Reply:

All Answers

Answer by 7 months ago
I think this was changed in later RC versions. Warner
Answer by 7 months ago
Do you mean r259308 in 10-STABLE? I compiled a new kernel with the change applied and still get SIGBUS on unaligned access. From my reading of the ARM manual for Cortex-A it looks like setting CPU_CONTROL_AFLT_ENABLE to 1 means you want to get a trap on unaligned access. "1 = Strict alignment fault checking enabled." I see that dab_align delivers a SIGBUS on user-mode alignment fault. Regards, Guy
Answer by 7 months ago
Guy, G> I am running 10.0-RC1 arm.armv6 on the BeagleBone Black. G> The "pfctl -s state" command is crashing when trying to print the G> second entry. G> G> struct pfsync_state has a size that is not divisiable by 4 or 8 leading to the G> second entry in the returned state array not being aligned and pfctl G> core dumps on Bus error when trying to access a uint32_t field. G> G> (gdb) bt G> #0 print_host (addr=0x2085a11a, port=7660, af=2 '\002', opts=1024) at G> /usr/src/sbin/pfctl/pf_print_state.c:178 G> #1 0x00021c4c in print_state (s=0x2085a0f2, opts=1024) at G> /usr/src/sbin/pfctl/pf_print_state.c:236 G> #2 0x0000c664 in pfctl_show_states (dev=<value optimized out>, G> iface=0x0, opts=1024) at /usr/src/sbin/pfctl/pfctl.c:1095 G> G> sizeof(struct pfsync_state_key) is 36 G> sizeof(struct pfsync_state_peer) is 32 G> sizeof(struct pf_addr) is 16 G> sizeof(struct pfsync_state) is 242 G> G> Removing the __spare[2] field will allow the struct to be aligned on 8 bytes G> for the u_int64_t id field and also cover the uint32_t fields alignment G> but this will break KBI. G> G> I am currently using an inefficient workaround in pfctl_show_states G> that memcpy each entry to a struct pfsync_state on the stack G> ensuring each call to print_state receives an aligned struct. G> G> 10.0-RC1 World and kernel were compiled in a VirtualBox VM running G> 9.2-RELEASE-p2 i386. G> clang and ARM_EABI used as the default make options. For pf we are ready to break KBI. It uses same structs for internal kernel representation and for ioctl() API and this is actually a bug. Until it is properly fixed, we are doomed to break KBI always. Unfortunately, pfsync_state is not only a KBI but also a wire protocol for pfsync(4). We can't break this, since that would make different FreeBSD versions not exchanging states properly. Well, <= 8.x already is incompatible with >= 9.x, thanks yet another OpenBSD import. But we don't want to introduce another one. I will try to fix this making new structure for the ioctl. That will mean moving slowly towards divorcing internal structures and ioctl ones. I'd appreciate if you file a PR on that, so that problem won't leave forgotten in the mailing list. You can even code the bugfix :) Thanks!
Answer by 7 months ago
Guy, G> sizeof(struct pfsync_state_key) is 36 G> sizeof(struct pfsync_state_peer) is 32 G> sizeof(struct pf_addr) is 16 G> sizeof(struct pfsync_state) is 242 I am also afraid that the pfsync(4) itself isn't alignment safe. And receiving and processing a pfsync packet with couple of states would panic an arm box. Is it possible for you to check this?
Answer by 7 months ago
Well pfsync has a version in its header so its quite possible to support many of them.
Answer by 7 months ago
Hi, I filled arm/185617 with some updated information. After further looking at why the kernel doesn't crash when filling the pfsync_state array and only the userspace pfctl is crashing I see that pfsync_state has the __packed attribute which means on arm unaligned access is used so there is no problem handling an unaligned pfsync_state. The reason pfctl crashes is because it passes a structure field as a pf_addr pointer. struct pf_addr is not __packed so on arm word access will be used, triggering the unaligned fault. So there is indeed no need to break the pfsync protocol. In if_pfsync.c I think all the accesses to pfsync_state are done using a pfsync_state pointer, there is no passing of struct fields as separate pointers and since the struct is covered by __packed there won't be an unaligned access. Thanks, Guy
Answer by 7 months ago
Quoted message by Gleb Smirnoff 7 months ago
Guy, G> I am running 10.0-RC1 arm.armv6 on the BeagleBone Black. G> The "pfctl -s state" command is crashing when trying to print the G> second entry. G> G> struct pfsync_state has a size that is not divisiable by 4 or 8 leading to the G> second entry in the returned state array not being aligned and pfctl G> core dumps on Bus error when trying to access a uint32_t field. G> G> (gdb) bt G> #0 print_host (addr=0x2085a11a, port=7660, af=2 '\002', opts=1024) at G> /usr/src/sbin/pfctl/pf_print_state.c:178 G> #1 0x00021c4c in print_state (s=0x2085a0f2, opts=1024) at G> /usr/src/sbin/pfctl/pf_print_state.c:236 G> #2 0x0000c664 in pfctl_show_states (dev=<value optimized out>, G> iface=0x0, opts=1024) at /usr/src/sbin/pfctl/pfctl.c:1095 G> G> sizeof(struct pfsync_state_key) is 36 G> sizeof(struct pfsync_state_peer) is 32 G> sizeof(struct pf_addr) is 16 G> sizeof(struct pfsync_state) is 242 G> G> Removing the __spare[2] field will allow the struct to be aligned on 8 bytes G> for the u_int64_t id field and also cover the uint32_t fields alignment G> but this will break KBI. G> G> I am currently using an inefficient workaround in pfctl_show_states G> that memcpy each entry to a struct pfsync_state on the stack G> ensuring each call to print_state receives an aligned struct. G> G> 10.0-RC1 World and kernel were compiled in a VirtualBox VM running G> 9.2-RELEASE-p2 i386. G> clang and ARM_EABI used as the default make options. For pf we are ready to break KBI. It uses same structs for internal kernel representation and for ioctl() API and this is actually a bug. Until it is properly fixed, we are doomed to break KBI always. Unfortunately, pfsync_state is not only a KBI but also a wire protocol for pfsync(4). We can't break this, since that would make different FreeBSD versions not exchanging states properly. Well, <= 8.x already is incompatible with >= 9.x, thanks yet another OpenBSD import. But we don't want to introduce another one. I will try to fix this making new structure for the ioctl. That will mean moving slowly towards divorcing internal structures and ioctl ones. I'd appreciate if you file a PR on that, so that problem won't leave forgotten in the mailing list. You can even code the bugfix :) Thanks!
Guy Yur wrote this message on Fri, Jan 10, 2014 at 00:17 +0200: Ok, that makes sense... so, either we mark struct pf_addr as __packed, or we do some nasty stuff, like the following in print_host: struct { struct pf_addr a } *uaddr __packed; uaddr = addr; aw.v.a.addr = uaddr->a; it's not pretty, but I believe it would work... "All that I will do, has been done, All that I have, has not."
Answer by 7 months ago
Quoted message by Guy Yur 7 months ago
Hi, I filled arm/185617 with some updated information. After further looking at why the kernel doesn't crash when filling the pfsync_state array and only the userspace pfctl is crashing I see that pfsync_state has the __packed attribute which means on arm unaligned access is used so there is no problem handling an unaligned pfsync_state. The reason pfctl crashes is because it passes a structure field as a pf_addr pointer. struct pf_addr is not __packed so on arm word access will be used, triggering the unaligned fault. So there is indeed no need to break the pfsync protocol. In if_pfsync.c I think all the accesses to pfsync_state are done using a pfsync_state pointer, there is no passing of struct fields as separate pointers and since the struct is covered by __packed there won't be an unaligned access. Thanks, Guy
For performance reasons, I don't think pf_addr should be marked as __packed. I attached the changes I am now using in print_state() since there is no need to copy the full pfsync_state, only pf_addr. I converted sk and nk from pointers to structs on the stack and using struct copy. pf_addr is 16 bytes. Regards, Guy
Answer by 7 months ago
Quoted message by John-Mark Gurney 7 months ago
Guy Yur wrote this message on Fri, Jan 10, 2014 at 00:17 +0200: Ok, that makes sense... so, either we mark struct pf_addr as __packed, or we do some nasty stuff, like the following in print_host: struct { struct pf_addr a } *uaddr __packed; uaddr = addr; aw.v.a.addr = uaddr->a; it's not pretty, but I believe it would work... "All that I will do, has been done, All that I have, has not."
Guy Yur wrote this message on Fri, Jan 10, 2014 at 01:04 +0200: Did you look at using the above trick? Since we are iterating over a list, that'll be a lot of copies, plus, I'm not sure that your fix will be guaranteed to work for ever.. since there isn't a requirement that the copy happens w/ bcopy/memcpy or some other copy routine that assumes things might not be aligned... Specificly these: - sk = >key[PF_SK_STACK]; - nk = >key[PF_SK_WIRE]; + sk = s->key[PF_SK_STACK]; + nk = s->key[PF_SK_WIRE]; since s->key is already assumed to be aligned, a future compiler could be smart enough to say, I'm not going to use the stack.. That would/could happen if print_host's addr arg grew a const which it could... Also, I just realized that some of the lines modify sk (setting port), but you don't write those modifications back to s->key[PF_SK_STACK]... "All that I will do, has been done, All that I have, has not."
Answer by 7 months ago
Quoted message by Guy Yur 7 months ago
For performance reasons, I don't think pf_addr should be marked as __packed. I attached the changes I am now using in print_state() since there is no need to copy the full pfsync_state, only pf_addr. I converted sk and nk from pointers to structs on the stack and using struct copy. pf_addr is 16 bytes. Regards, Guy
Right. The correct fix would be to have a separate struct for the ioctl that can be aligned as Gleb suggested. I will try to prepare and test such changes. If new ioctls are added, the KBI can also be preserved. pfsync_state_export is used by if_pfsync.c and pf_ioctl.c so there will be duplicated code even if reusing the old ioctls with the new struct. I thought that because s itself is __packed and key is an array inside s the __packed will apply to it too, since the disassembly showed clang chose to do a memcpy, I don't know if that is true or not. An explicit bcopy is safer. I see that the function already does a bcopy of the u_int64_t id field so it has some assumption that the structure might not be aligned. If a new always aligned structure is used for the ioctl, the bcopy of id can also be avoided. The print function doesn't need to modify s->key, the port changes are only for passing to print_host. Regards, Guy