hls_hbm_memcopy_512
Code location:
Code can be found at:https://github.com/OpenCAPI/oc-accel/blob/master/actions/hls_hbm_memcopy_512/
Overview:
This is an action using HLS, 512 bits bus wide towards OpenCAPI interface which is 1024 bits wide.
To achieve this a 1024 to 512b converter is introduced, as P9 OpenCAPI uses a 1024 bits wide bus.
It can be checked in /action.Kconfig file that the ACTION_HALF_WIDTH bloc is set for this example, so the interface will implement the half width converter.
This allows older actions to be converted in a snap at the cost of lower performance.
This generic test can be also used to evaluate the throughput to/from FPGA and LCL memories (local can be DDR or HBM depending on cards used).
It reports bandwidth of:
- Host -> FPGA_RAM
- FPGA_RAM -> Host
- FPGA (LCL -> RAM)
- FPGA (RAM -> LCL)
hw_test
Example on IC922 with a OC-AD9V3 card:
$ cd actions/hls_memcopy_512/tests
$ sudo ./hw_throughput_test.sh -dINCR
Build Date: [00000008] 0000202009150920
+-------------------------------------------------------------------------------+
| OC-Accel hls_memcopy_512 Throughput (MBytes/s) |
+-------------------------------------------------------------------------------+
+------------LCL stands for DDR or HBM memory accordingto hardware--------------+
bytes Host->FPGA_RAM FPGA_RAM->Host FPGA(LCL->BRAM) FPGA(BRAM->LCL)
-------------------------------------------------------------------------------
512 10.240 10.240 10.449 10.449
1024 1.497 20.480 21.333 25.600
2048 40.960 41.796 42.667 41.796
4096 81.920 81.920 83.592 83.592
8192 132.129 11.855 11.872 11.977
16384 23.918 321.255 12.319 12.319
32768 24.582 46.946 47.628 47.628
65536 94.432 95.394 94.980 94.980
131072 188.593 189.959 187.782 186.979
262144 370.260 366.635 372.364 371.309
524288 717.220 729.191 718.203 717.220
1048576 1354.749 1361.787 1353.001 1476.868
2097152 2621.440 2427.259 2441.388 2467.238
4194304 4084.035 2523.649 2520.615 2593.880
8388608 4167.217 4120.141 4158.953 4136.394
16777216 6028.464 8140.328 7084.973 8101.022
33554432 9877.666 8212.049 7983.448 8152.194
67108864 9884.941 9565.117 9774.084 9765.551
134217728 10507.925 10888.110 10844.124 10824.883
268435456 12041.244 11543.625 11482.887 11460.336
To get the best results, it may be useful to ensure you have the ocapi link attached to the core where the program is executed. If you have 2 nodes (check with numactl -s), you can try the 4 following combinations:
sudo numactl -m0 -N0 ./oc-accel/actions/hls_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m8 -N0 ./oc-accel/actions/hls_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m0 -N8 ./oc-accel/actions/hls_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m8 -N8 ./oc-accel/actions/hls_memcopy_1024/tests/hw_throughput_test.sh -d INCR
Note:
"-m" stands for memory: allocate selected memory from nodes
"-N" stands for nodes: execute command on the CPUs of selected nodes