Since build 5.2.2022, WCS supports hardware acceleration using NVIDIA GPU while decoding and encoding video.
If a stream is decoded on CPU then it will be encoded on CPU too (excluding a mixer: incoming streams are decoded on CPU, but outgoing stream may be encoded on GPU) |
GPU transcoding is not supported for:
In all of the cases a stream must be decoded on CPU, hardware acceleration should be disabled |
It is recommended to deploy a separate server without GPU for the cases listed above.
NVIDIA CUDA drivers must be installed to the server using official manual for hardware acceleration to work. CUDA toolkit is not needed to be installed:
sudo apt-get install -y cuda-drivers |
If the drivers are installed successfully, the nvidia-smi
utility displays a current GPU state
Hardware acceleration support should be enabled by the following parameter
enable_hardware_acceleration=true |
CUDA encoder/decoder has a priority for supported use cases when hardware acceleration is enabled! |
GPU default settings are enough for a minimal transcoding sessions and testing. But it is necessary to set up some parameters like a maximum available encoders per GPU and per server for production use. The configuration file /usr/local/FlashphonerWebCallServer/conf/gpu_config.json
is used to do this
[ { "gpuName" : "Tesla P100-PCIE-16GB", "disabled" : false, "config" : { "maxEncodeSessions" : 95, "maxHostEncodeSessions" : 95, "maxDecodeSessions" : 19, "maxEncoderUtilization" : 90, "maxDecoderUtilization" : 90, "minMemoryAvailable" : "4 GB" } } ] |
[ { "pciBusId" : "0000:02:00.0", "disabled" : false, "config" : { "maxEncodeSessions" : 95, "maxHostEncodeSessions" : 95, "maxDecodeSessions" : 19, "maxEncoderUtilization" : 90, "maxDecoderUtilization" : 90, "minMemoryAvailable" : "4 GB" } } ] |
Where:
If the server has more then one GPU, the configuration with GPU name is applied to all the GPUs with the same name. The configuration with PCI bus Id is applied to the GPU with this Id only because PCI bus Ids are unique.
There is a special tool to test a GPU load capability on the server and to generate a configuration file. The tool should be launched as follows:
/usr/local/FlashphonerWebCallServer/tools/gpu_calibration_tool.sh sample.mp4 --separate-test --interval=20 |
Where:
The tool uses encoding profiles set in /usr/local/FlashphonerWebCallServer/conf/hls_abr_profiles.yml
file (see HLS ABR on a single node) to test encoding.
The tool makes a test and creates the configuration file with a maximum GPU parameters by test results. If more than one chips are tested, a PCI bus Ids will be set in configuration file. If only one chip is tested (no --separate-test
key), a GPU name will be set in configuration file.
If the server has a different types of NVIDIA GPUs and --separate-test
key is not set, a one GPU of the type will be tested
WCS must be stopped during the test! |
A WebRTC stream publishing resolution may occasionally change. Also, a screen sharing stream capturing a single window will change its resolution when window size is changed. A GPU decoder parameters should be changed after every stream resolution changes. The default parameters
hardware_acceleration_enable_soft_reconfiguration=true hardware_acceleration_reconfigure_max_width=1920 hardware_acceleration_reconfigure_max_height=1088 |
allow only resolution settings to be changed if the stream resolution has decreased. But decoder settings will be reset if the resolution has increased over the threshold set. Actually, like a new decoder is created.
The following parameter
hardware_acceleration_enable_soft_reconfiguration=false |
allows a decoder settings to be reset on every resolution change. This may dramatically affect a performance.
A current GPU usage statistics may be received using REST API.
A REST-query should be HTTP/HTTPS POST request as follows:
Where:
Get current GPU usage statistics
POST /rest-api/gpu/info HTTP/1.1 Host: localhost:8081 Content-Type: application/json |
HTTP/1.1 200 OK Access-Control-Allow-Origin: * Content-Type: application/json { "cudaVersion": "12.4", "nvencVersion": "12.2", "driverVersion": "550.90.07", "nvmlVersion": "12.550.90.07", "numOfAvailableDevices": 1, "numOfDecodingSessions": 1, "numOfEncodingSessions": 5, "numOfHostEncodingSessions": 5, "deviceList": [ { "name": "Tesla P100-PCIE-16GB", "pciBusId": "0000:02:00.0", "computeCapability": "6.0", "computeMode": 0, "computeModeDescription": "Default compute mode (Multiple contexts allowed per device)", "numOfDecodingSessions": 1, "numOfEncodingSessions": 5, "numOfHostEncodingSessions": 5, "fpsStats": { "decoderFps": 30, "encoderFps": 30 }, "nativeEncoderStats": { "averageFps": 29, "averageLatency": 1225 }, "utilizationState": { "computeUtilization": 1, "memUtilization": 0, "decoderUtilization": 5, "encoderUtilization": 4, "totalMem": "16 GB", "freeMem": "15.26 GB", "usedMem": "752.25 MB", "usedPciBandwidth": "984.76 KB" }, "supportedEncodeCodecs": { "H265": { "minWidth": 65, "minHeight": 33, "maxWidth": 4096, "maxHeight": 4096 }, "H264": { "minWidth": 33, "minHeight": 17, "maxWidth": 4096, "maxHeight": 4096 } }, "supportedDecodeCodecs": { "H265": { "minWidth": 144, "minHeight": 144, "maxWidth": 4096, "maxHeight": 4096 }, "H264": { "minWidth": 48, "minHeight": 16, "maxWidth": 4096, "maxHeight": 4096 } } } ] } |
Code | Reason |
---|---|
200 | OK |
500 | Internal server error |
Parameter | Description | Example |
---|---|---|
cudaVersion | CUDA version | 12.4 |
nvencVersion | NVENC version | 12.2 |
driverVersion | NVIDIA drivers version | 550.90.07 |
nvmlVersion | NVIDIA management library version | 12.550.90.07 |
numOfAvailableDevices | GPU available count | 1 |
numOfDecodingSessions | Decoders count | 1 |
numOfEncodingSessions | Encoders count | 5 |
numOfHostEncodingSessions | Encoders count per server | 5 |
name | GPU name | Tesla P100-PCIE-16GB |
pciBusId | GPU PCI bud Id | 0000:02:00.0 |
computeCapability | GPU capabilities Id | 6.0 |
computeMode | GPU compute mode | 0 |
computeModeDescription | GPU compute mode description | Default compute mode (Multiple contexts allowed per device) |
decoderFps | Decoding FPS | 30 |
encoderFps | Encoding FPS | 30 |
averageFps | Average FPS by native GPU stats | 29 |
averageLatency | Average latency in microseconds by native GPU stats | 1225 |
computeUtilization | GPU compute utilization percent | 1 |
memUtilization | Memory utilization percent | 0 |
decoderUtilization | Decoder utilization percent | 5 |
encoderUtilization | Encoder utilization percent | 4 |
totalMem | Total memory amount | 16 GB |
freeMem | Free memory amount | 15.26 GB |
usedMem | Used memory amount | 752.25 MB |
usedPciBandwidth | PCI bus bandwidth used | 984.76 KB |
minWidth | Minimal picture width to decode/encode | 33 |
minHeight | Minimal picture height to decode/encode | 17 |
maxWidth | Maximum picture width to decode/encode | 4096 |
maxHeight | Maximum picture height to decode/encode | 4096 |