On Mar 19, 2015, at 9:24 PM, Tyler Baker tyler.baker@linaro.org wrote:
FYI, not sure if you are on 96boards dev list
Thanks Tyler!
[I'm not on the mailing list, so will have to reply in a new thread.]
---------- Forwarded message ---------- From: Jerome Forissier jerome.forissier@linaro.org Date: 19 March 2015 at 11:18 Subject: [Dev] HiKey: ARM TF BL1 hangs when compiled with GCC 4.9 To: dev dev@lists.96boards.org
Hi all,
I am running the ARM Trusted Firmware on my HiKey board. I found that BL1 hangs if I build it with the version of GCC that comes with my Ubuntu 14.10 distribution [1] or with another 4.9 build from Linaro [2]. However, it works as expected if I use the 4.8 Linaro build [3] as recommended on the HiKey UEFI wiki [4].
Basically I found two separate issues, and I'm not sure if they are bugs in GCC or HiKey ATF. Here is the story...
With GCC 4.9 [1], the boot hangs, LED#2 blinks and "00000000f20003e8" is printed on UART0 about every second. Let's call this bug #1. The hang occurs in hi6220_pll_init() [5], execution never gets passed this line: mmio_write_32(0x0, 0xa5a55a5a);
So I checked the objdump outputs (bl1.dump).
Working compiler [3] gives: f98041f0: d2800000 mov x0, #0x0 // #0 f98041f4: 528b4b41 mov w1, #0x5a5a // #23130 f98041f8: 72b4b4a1 movk w1, #0xa5a5, lsl #16 f98041fc: b9000001 str w1, [x0]
Bad compiler [1] produces: f9804184: d2800000 mov x0, #0x0 // #0 f9804188: b900001f str wzr, [x0] f980418c: d4207d00 brk #0x3e8
What?! Is there some kind of smart detection in the compiler assuming that one shouldn't write to address zero?
Yes, this is exactly correct. Compiler assumes that it is compiling normal application user-level code, and treats references to null address as undefined code, which it is free to optimize as it sees fit. In this case the compiler seems to have decided that value in w1 register is never used because it is live only on the path that leads to write-to-null. Note, that the compiler still kept the write itself to keep parity in thrown exceptions between original and optimized code.
Any code that references null address as a valid memory location should use -fno-delete-null-pointer-checks compiler flag. This flag is often added automatically for toolchains that target bare-metal, and, I'm guessing, you are using a "normal" aarch64-linux-gnu toolchain.
If I change address to 0x4, the code goes past this location but later hangs with the same LED status as above (b0100) and code "0000000096000021" on the console. This is bug #2.
I tracked it down to the initialization of some structures on the stack when entering usb_handle_control_request() [6]. Looks like an alignment issue, since removing the packed attribute on struct usb_endpoint_descriptor [7] fixes the bug.
Let us (Linaro TCWG) know if this doesn't go away with -fno-delete-null-pointer-checks. The best way is to file a bug for GCC product in bugs.linaro.org.
So... What kind of bugs do you guys think we have here, and who should I report them to?
[1] aarch64-linux-gnu-gcc (Ubuntu/Linaro 4.9.1-16ubuntu6) 4.9.1 [2] aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease) [3] aarch64-linux-gnu-gcc (crosstool-NG linaro-1.13.1-4.8-2014.04 - Linaro GCC 4.8-2014.04) 4.8.3 20140401 (prerelease) [4] https://github.com/96boards/documentation/wiki/UEFI [5] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [6] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [7] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c...
-- Maxim Kuvyrkov www.linaro.org
Op 20 mrt. 2015, om 06:58 heeft Maxim Kuvyrkov maxim.kuvyrkov@linaro.org het volgende geschreven:
On Mar 19, 2015, at 9:24 PM, Tyler Baker tyler.baker@linaro.org wrote:
FYI, not sure if you are on 96boards dev list
Thanks Tyler!
[I'm not on the mailing list, so will have to reply in a new thread.]
---------- Forwarded message ---------- From: Jerome Forissier jerome.forissier@linaro.org Date: 19 March 2015 at 11:18 Subject: [Dev] HiKey: ARM TF BL1 hangs when compiled with GCC 4.9 To: dev dev@lists.96boards.org
Hi all,
I am running the ARM Trusted Firmware on my HiKey board. I found that BL1 hangs if I build it with the version of GCC that comes with my Ubuntu 14.10 distribution [1] or with another 4.9 build from Linaro [2]. However, it works as expected if I use the 4.8 Linaro build [3] as recommended on the HiKey UEFI wiki [4].
Basically I found two separate issues, and I'm not sure if they are bugs in GCC or HiKey ATF. Here is the story...
With GCC 4.9 [1], the boot hangs, LED#2 blinks and "00000000f20003e8" is printed on UART0 about every second. Let's call this bug #1. The hang occurs in hi6220_pll_init() [5], execution never gets passed this line: mmio_write_32(0x0, 0xa5a55a5a);
So I checked the objdump outputs (bl1.dump).
Working compiler [3] gives: f98041f0: d2800000 mov x0, #0x0 // #0 f98041f4: 528b4b41 mov w1, #0x5a5a // #23130 f98041f8: 72b4b4a1 movk w1, #0xa5a5, lsl #16 f98041fc: b9000001 str w1, [x0]
Bad compiler [1] produces: f9804184: d2800000 mov x0, #0x0 // #0 f9804188: b900001f str wzr, [x0] f980418c: d4207d00 brk #0x3e8
What?! Is there some kind of smart detection in the compiler assuming that one shouldn't write to address zero?
Yes, this is exactly correct. Compiler assumes that it is compiling normal application user-level code, and treats references to null address as undefined code, which it is free to optimize as it sees fit. In this case the compiler seems to have decided that value in w1 register is never used because it is live only on the path that leads to write-to-null. Note, that the compiler still kept the write itself to keep parity in thrown exceptions between original and optimized code.
Any code that references null address as a valid memory location should use -fno-delete-null-pointer-checks compiler flag. This flag is often added automatically for toolchains that target bare-metal, and, I'm guessing, you are using a "normal" aarch64-linux-gnu toolchain.
If I add -fno-delete-null-pointer-checks bug #1 goes away, but...
If I change address to 0x4, the code goes past this location but later hangs with the same LED status as above (b0100) and code "0000000096000021" on the console. This is bug #2.
.... I get:
Switch to aarch64 mode. CPU0 executes at 0xf9801000! NOTICE: Booting Trusted Firmware NOTICE: BL1: v1.1(release):ad26beb NOTICE: BL1: Built : 07:45:07, Mar 20 2015 NOTICE: succeed to init lpddr3 rank0 dram phy INFO: lpddr3_freq_init, set ddrc 533mhz INFO: init ddr3 rank0 in 533MHz INFO: ddr3 rank1 init pass in 533MHz INFO: Elpida DDR 000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210
I tracked it down to the initialization of some structures on the stack when entering usb_handle_control_request() [6]. Looks like an alignment issue, since removing the packed attribute on struct usb_endpoint_descriptor [7] fixes the bug.
I'll try that.
-- Koen Kooi Builds and Baselines | Release Manager Linaro.org | Open source software for ARM SoCs
Let us (Linaro TCWG) know if this doesn't go away with -fno-delete-null-pointer-checks. The best way is to file a bug for GCC product in bugs.linaro.org.
So... What kind of bugs do you guys think we have here, and who should I report them to?
[1] aarch64-linux-gnu-gcc (Ubuntu/Linaro 4.9.1-16ubuntu6) 4.9.1 [2] aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease) [3] aarch64-linux-gnu-gcc (crosstool-NG linaro-1.13.1-4.8-2014.04 - Linaro GCC 4.8-2014.04) 4.8.3 20140401 (prerelease) [4] https://github.com/96boards/documentation/wiki/UEFI [5] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [6] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [7] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c...
-- Maxim Kuvyrkov www.linaro.org
Op 20 mrt. 2015, om 08:00 heeft Koen Kooi koen.kooi@linaro.org het volgende geschreven:
Op 20 mrt. 2015, om 06:58 heeft Maxim Kuvyrkov maxim.kuvyrkov@linaro.org het volgende geschreven:
On Mar 19, 2015, at 9:24 PM, Tyler Baker tyler.baker@linaro.org wrote:
FYI, not sure if you are on 96boards dev list
Thanks Tyler!
[I'm not on the mailing list, so will have to reply in a new thread.]
---------- Forwarded message ---------- From: Jerome Forissier jerome.forissier@linaro.org Date: 19 March 2015 at 11:18 Subject: [Dev] HiKey: ARM TF BL1 hangs when compiled with GCC 4.9 To: dev dev@lists.96boards.org
Hi all,
I am running the ARM Trusted Firmware on my HiKey board. I found that BL1 hangs if I build it with the version of GCC that comes with my Ubuntu 14.10 distribution [1] or with another 4.9 build from Linaro [2]. However, it works as expected if I use the 4.8 Linaro build [3] as recommended on the HiKey UEFI wiki [4].
Basically I found two separate issues, and I'm not sure if they are bugs in GCC or HiKey ATF. Here is the story...
With GCC 4.9 [1], the boot hangs, LED#2 blinks and "00000000f20003e8" is printed on UART0 about every second. Let's call this bug #1. The hang occurs in hi6220_pll_init() [5], execution never gets passed this line: mmio_write_32(0x0, 0xa5a55a5a);
So I checked the objdump outputs (bl1.dump).
Working compiler [3] gives: f98041f0: d2800000 mov x0, #0x0 // #0 f98041f4: 528b4b41 mov w1, #0x5a5a // #23130 f98041f8: 72b4b4a1 movk w1, #0xa5a5, lsl #16 f98041fc: b9000001 str w1, [x0]
Bad compiler [1] produces: f9804184: d2800000 mov x0, #0x0 // #0 f9804188: b900001f str wzr, [x0] f980418c: d4207d00 brk #0x3e8
What?! Is there some kind of smart detection in the compiler assuming that one shouldn't write to address zero?
Yes, this is exactly correct. Compiler assumes that it is compiling normal application user-level code, and treats references to null address as undefined code, which it is free to optimize as it sees fit. In this case the compiler seems to have decided that value in w1 register is never used because it is live only on the path that leads to write-to-null. Note, that the compiler still kept the write itself to keep parity in thrown exceptions between original and optimized code.
Any code that references null address as a valid memory location should use -fno-delete-null-pointer-checks compiler flag. This flag is often added automatically for toolchains that target bare-metal, and, I'm guessing, you are using a "normal" aarch64-linux-gnu toolchain.
If I add -fno-delete-null-pointer-checks bug #1 goes away, but...
If I change address to 0x4, the code goes past this location but later hangs with the same LED status as above (b0100) and code "0000000096000021" on the console. This is bug #2.
.... I get:
Switch to aarch64 mode. CPU0 executes at 0xf9801000! NOTICE: Booting Trusted Firmware NOTICE: BL1: v1.1(release):ad26beb NOTICE: BL1: Built : 07:45:07, Mar 20 2015 NOTICE: succeed to init lpddr3 rank0 dram phy INFO: lpddr3_freq_init, set ddrc 533mhz INFO: init ddr3 rank0 in 533MHz INFO: ddr3 rank1 init pass in 533MHz INFO: Elpida DDR 000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210000000096000021000000009600002100000000960000210
I tracked it down to the initialization of some structures on the stack when entering usb_handle_control_request() [6]. Looks like an alignment issue, since removing the packed attribute on struct usb_endpoint_descriptor [7] fixes the bug.
I'll try that.
Confirmed, I've now booted bl1.bin + fip.bin built with gcc 4.9 and linux boots fine!.
-- Koen Kooi Builds and Baselines | Release Manager Linaro.org | Open source software for ARM SoCs
Let us (Linaro TCWG) know if this doesn't go away with -fno-delete-null-pointer-checks. The best way is to file a bug for GCC product in bugs.linaro.org.
So... What kind of bugs do you guys think we have here, and who should I report them to?
[1] aarch64-linux-gnu-gcc (Ubuntu/Linaro 4.9.1-16ubuntu6) 4.9.1 [2] aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease) [3] aarch64-linux-gnu-gcc (crosstool-NG linaro-1.13.1-4.8-2014.04 - Linaro GCC 4.8-2014.04) 4.8.3 20140401 (prerelease) [4] https://github.com/96boards/documentation/wiki/UEFI [5] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [6] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c... [7] https://github.com/96boards/arm-trusted-firmware/blob/bbd623798cb775c4c0445c...
-- Maxim Kuvyrkov www.linaro.org
Dev mailing list Dev@lists.96boards.org https://lists.96boards.org/mailman/listinfo/dev
On Fri, Mar 20, 2015 at 6:58 AM, Maxim Kuvyrkov maxim.kuvyrkov@linaro.org wrote:
On Mar 19, 2015, at 9:24 PM, Tyler Baker tyler.baker@linaro.org wrote:
FYI, not sure if you are on 96boards dev list
Thanks Tyler!
[I'm not on the mailing list, so will have to reply in a new thread.]
---------- Forwarded message ---------- From: Jerome Forissier jerome.forissier@linaro.org Date: 19 March 2015 at 11:18 Subject: [Dev] HiKey: ARM TF BL1 hangs when compiled with GCC 4.9 To: dev dev@lists.96boards.org
Hi all,
I am running the ARM Trusted Firmware on my HiKey board. I found that
BL1 hangs if I build it with the version of GCC that comes with my Ubuntu 14.10 distribution [1] or with another 4.9 build from Linaro [2]. However, it works as expected if I use the 4.8 Linaro build [3] as recommended on the HiKey UEFI wiki [4].
Basically I found two separate issues, and I'm not sure if they are bugs
in GCC or HiKey ATF. Here is the story...
With GCC 4.9 [1], the boot hangs, LED#2 blinks and "00000000f20003e8" is
printed on UART0 about every second. Let's call this bug #1. The hang occurs in hi6220_pll_init() [5], execution never gets passed this line:
mmio_write_32(0x0, 0xa5a55a5a);
So I checked the objdump outputs (bl1.dump).
- Working compiler [3] gives: f98041f0: d2800000 mov x0, #0x0
// #0
f98041f4: 528b4b41 mov w1, #0x5a5a
// #23130
f98041f8: 72b4b4a1 movk w1, #0xa5a5, lsl #16 f98041fc: b9000001 str w1, [x0]
- Bad compiler [1] produces: f9804184: d2800000 mov x0, #0x0
// #0
f9804188: b900001f str wzr, [x0] f980418c: d4207d00 brk #0x3e8
What?! Is there some kind of smart detection in the compiler assuming
that one shouldn't write to address zero?
Yes, this is exactly correct. Compiler assumes that it is compiling normal application user-level code, and treats references to null address as undefined code, which it is free to optimize as it sees fit. In this case the compiler seems to have decided that value in w1 register is never used because it is live only on the path that leads to write-to-null. Note, that the compiler still kept the write itself to keep parity in thrown exceptions between original and optimized code.
Any code that references null address as a valid memory location should use -fno-delete-null-pointer-checks compiler flag. This flag is often added automatically for toolchains that target bare-metal, and, I'm guessing, you are using a "normal" aarch64-linux-gnu toolchain.
Yes, I am using the "normal" (non-bare-metal) toolchain, with the -ffreestanding flag. Now I understand what's happening
If I change address to 0x4, the code goes past this location but later
hangs with the same LED status as above (b0100) and code "0000000096000021" on the console. This is bug #2.
I tracked it down to the initialization of some structures on the stack
when entering usb_handle_control_request() [6]. Looks like an alignment issue, since removing the packed attribute on struct usb_endpoint_descriptor [7] fixes the bug.
Let us (Linaro TCWG) know if this doesn't go away with -fno-delete-null-pointer-checks. The best way is to file a bug for GCC product in bugs.linaro.org.
-fno-delete-null-pointer-checks does make the problem go away. So no compiler issue here ;-)
Thanks for the explanation.