ACL Execution Provider
The ACL Execution Provider enables accelerated performance on Arm®-based CPUs through Arm Compute Library.
Build
For build instructions, please see the build page.
Usage
C/C++
Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
Ort::SessionOptions sf;
bool enable_fast_math = true;
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ACL(sf, enable_fast_math));
The C API details are here.
Python
import onnxruntime
providers = [("ACLExecutionProvider", {"enable_fast_math": "true"})]
sess = onnxruntime.InferenceSession("model.onnx", providers=providers)
Performance Tuning
Arm Compute Library has a fast math mode that can increase performance with some potential decrease in accuracy for MatMul and Conv operators. It is disabled by default.
When using onnxruntime_perf_test, use the flag -e acl to enable the ACL Execution Provider. You can additionally use -i 'enable_fast_math|true' to enable fast math.
Arm Compute Library uses the ONNX Runtime intra-operator thread pool when running via the execution provider. You can control the size of this thread pool using the -x option.
Supported Operators
| Operator | Supported types |
|---|---|
| AveragePool | float |
| BatchNormalization | float |
| Concat | float |
| Conv | float, float16 |
| FusedConv | float |
| FusedMatMul | float, float16 |
| Gemm | float |
| GlobalAveragePool | float |
| GlobalMaxPool | float |
| MatMul | float, float16 |
| MatMulIntegerToFloat | uint8, int8, uint8+int8 |
| MaxPool | float |
| NhwcConv | float |
| Relu | float |
| QLinearConv | uint8, int8, uint8+int8 |