hls_hbm_memcopy_1024

Code location:

Code can be found at:https://github.com/OpenCAPI/oc-accel/blob/master/actions/hls_hbm_memcopy_1024/

Overview:

This is an action using HLS, 1024b and HBM memory found on fpga used on cards like OC-9Hx. 1024b is the optimum configuration as P9 OpenCAPI uses a 1024 bits wide bus.

It can be checked in action/Kconfig file that the ACTION_HALF_WIDTH bloc is not used for this example, so the interface uses the OpenCAPI 1024 bit bus.

Note on HBM usage in OC-ACCEL:

OC-9H3 and OC-9H7 cards for example each contain 8GB of HBM reachable through up to 32 AXI buses connected to 256MB HBM memories. The test done here exercises only 1 HBM.

As all these HBM can be accessed independently in parallel, this means that the overall throughout of the 8GB of HBM can be multiplied by 32.

The HBM can also be configured in different manners (One 8GB HBM, multiple access to 256MB modules,...).

The default choice in Kconfig menu is 12 HBM memories set in parallel. Reducing to 1 memory can easily be done by modifying at the same time:

  • in the Kconfig menu
  • in the hls_hbm_memcopy_1024.cpp code changing the parameter #define HBM_AXI_IF_NB to 1.

Bandwidth Evaluation Test

This generic test can be also used to evaluate the throughput to/from FPGA and LCL memories (local can be DDR or HBM depending on cards used).

It reports bandwidth of:

  • Host -> FPGA_RAM
  • FPGA_RAM -> Host
  • FPGA (HBM -> RAM)
  • FPGA (RAM -> HBM)

hw_throughput_test

Example on IC922 with a OC-AD9H3 card:

$ cd actions/hls_hbm_memcopy_1024/tests
$ sudo ./hw_throughput_test.sh -dINCR
+-------------------------------------------------------------------------------+
|            OC-Accel hls_hbm_memcopy_1024 Throughput (MBytes/s)                |
+-------------------------------------------------------------------------------+
       bytes   Host->FPGA_RAM   FPGA_RAM->Host   FPGA(HBM->RAM)   FPGA(RAM->HBM)
 -------------------------------------------------------------------------------
         512            0.739            0.742            0.741            9.481
        1024           17.965           18.963            1.488            1.482
        2048            2.955            2.934            2.968            3.012
        4096            5.945            5.885            5.902            5.911
        8192           11.924           11.907           11.703          148.945
       16384          292.571          292.571          227.556           23.406
       32768           46.612           47.080           46.217          555.390
       65536          897.753         1110.780         1024.000          101.292
      131072          186.447          185.918          185.918          199.805
      262144          403.298          366.635          359.594         2759.411
      524288         6393.756         5825.422          673.892          682.667
     1048576         1396.240         7231.559         1216.445         1299.351
     2097152        12409.183         8774.695         5475.593         5282.499
     4194304         4832.147         4185.932         2166.479         2075.361
     8388608         4639.717         4269.012         3063.772         4080.062
    16777216        10343.536         9805.503         5027.634         4178.634
    33554432        10485.760        10277.008         5053.378         4919.283
    67108864        13166.346        13560.086         5627.106         5445.822
   134217728        15080.644        16192.270         5969.477         5760.912
   268435456        16460.354        17956.750         6062.776         5929.392
   536870912        17014.893        19000.917
  1073741824        17391.630        19322.677
ok
Test OK

To get the best results, it may be useful to ensure you have the ocapi link attached to the core where the program is executed. If you have 2 nodes (check with numactl -s), you can try the 4 following combinations:

sudo numactl -m0 -N0 ./oc-accel/actions/hls_hbm_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m8 -N0 ./oc-accel/actions/hls_hbm_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m0 -N8 ./oc-accel/actions/hls_hbm_memcopy_1024/tests/hw_throughput_test.sh -d INCR
sudo numactl -m8 -N8 ./oc-accel/actions/hls_hbm_memcopy_1024/tests/hw_throughput_test.sh -d INCR

Note:

"-m" stands for memory: allocate selected memory from nodes

"-N" stands for nodes: execute command on the CPUs of selected nodes