Overview

Since build 5.2.2022, WCS supports hardware acceleration using NVIDIA GPU while decoding and encoding video.

Server requirements

Ubuntu 22.04 or newer
NVIDIA drivers installed
NVIDIA GPU card installed

Use cases supported

H264 and H265 transcoding
H264 mixer (encoding only)
HLS ABR H264
WebRTC ABR H264

If a stream is decoded on CPU then it will be encoded on CPU too (excluding a mixer: incoming streams are decoded on CPU, but outgoing stream may be encoded on GPU)

Use cases unsupported

GPU transcoding is not supported for:

In all of the cases a stream must be decoded on CPU, hardware acceleration should be disabled

It is recommended to deploy a separate server without GPU for the cases listed above.

Configuration

NVIDIA CUDA drivers must be installed to the server using official manual for hardware acceleration to work. CUDA toolkit is not needed to be installed:

sudo apt-get install -y cuda-drivers

If the drivers are installed successfully, the nvidia-smi utility displays a current GPU state

Web Call Server 5.2 - EN > Hardware acceleration support for video transcoding > gpu-support-nvidia-smi.png

Hardware acceleration support should be enabled by the following parameter

enable_hardware_acceleration=true

CUDA encoder/decoder has a priority for supported use cases when hardware acceleration is enabled!

GPU load tuning

GPU default settings are enough for a minimal transcoding sessions and testing. But it is necessary to set up some parameters like a maximum available encoders per GPU and per server for production use. The configuration file /usr/local/FlashphonerWebCallServer/conf/gpu_config.json is used to do this

[
  {
    "gpuName" : "Tesla P100-PCIE-16GB",
    "disabled" : false,
    "config" : {
      "maxEncodeSessions" : 95,
      "maxHostEncodeSessions" : 95,
      "maxDecodeSessions" : 19,
      "maxEncoderUtilization" : 90,
      "maxDecoderUtilization" : 90,
      "minMemoryAvailable" : "4 GB"
    }
  }
]

[
  {
    "pciBusId" : "0000:02:00.0",
    "disabled" : false,
    "config" : {
      "maxEncodeSessions" : 95,
      "maxHostEncodeSessions" : 95,
      "maxDecodeSessions" : 19,
      "maxEncoderUtilization" : 90,
      "maxDecoderUtilization" : 90,
      "minMemoryAvailable" : "4 GB"
    }
  }
]

Where:

gpuName - GPU name
pciBusId - GPU PCI bus identifier
disabled - do not use this GPU
maxEncodeSessions - maximum encoders quantity per GPU
maxHostEncodeSessions - maximum encoders quantity per server
maxDecodeSessions - maximum decoders quantity per GPU
maxEncoderUtilization - maximum encoding GPU load, in percents
maxDecoderUtilization - maximum decoding GPU load, in percents
minMemoryAvailable - minimal free memory amount to be available

If the server has more then one GPU, the configuration with GPU name is applied to all the GPUs with the same name. The configuration with PCI bus Id is applied to the GPU with this Id only because PCI bus Ids are unique.

GPU testing and configuration file generation tool

There is a special tool to test a GPU load capability on the server and to generate a configuration file. The tool should be launched as follows:

/usr/local/FlashphonerWebCallServer/tools/gpu_calibration_tool.sh sample.mp4 --separate-test --interval=20

Where:

sample.mp4 - a source video file for testing, should contain a video track in a maximum resolution to use in production, for example 1920x1080
--separate-test - test all the GPUs installed and available to operating system on the server; if the key is not set, only one random GPU will be tested
--interval - interval in seconds between a different chips testing

The tool uses encoding profiles set in /usr/local/FlashphonerWebCallServer/conf/hls_abr_profiles.yml file (see HLS ABR on a single node) to test encoding.

The tool makes a test and creates the configuration file with a maximum GPU parameters by test results. If more than one chips are tested, a PCI bus Ids will be set in configuration file. If only one chip is tested (no --separate-test key), a GPU name will be set in configuration file.

If the server has a different types of NVIDIA GPUs and --separate-test key is not set, a one GPU of the type will be tested

WCS must be stopped during the test!

Decoder/encoder reconfiguration after source stream resolution changing

A WebRTC stream publishing resolution may occasionally change. Also, a screen sharing stream capturing a single window will change its resolution when window size is changed. A GPU decoder parameters should be changed after every stream resolution changes. The default parameters

hardware_acceleration_enable_soft_reconfiguration=true
hardware_acceleration_reconfigure_max_width=1920
hardware_acceleration_reconfigure_max_height=1088

allow only resolution settings to be changed if the stream resolution has decreased. But decoder settings will be reset if the resolution has increased over the threshold set. Actually, like a new decoder is created.

The following parameter

hardware_acceleration_enable_soft_reconfiguration=false

allows a decoder settings to be reset on every resolution change. This may dramatically affect a performance.

Monitoring

A current GPU usage statistics may be received using REST API.

A REST-query should be HTTP/HTTPS POST request as follows:

HTTP: http://streaming.flashphoner.com:8081/rest-api/gpu/info
HTTPS: https://streaming.flashphoner.com:8444/rest-api/gpu/info

Where:

streaming.flashphoner.com - WCS server address
8081 - the standard REST / HTTP port of the WCS server
8444 - the standard HTTPS port
rest-api - the required part of the URL
/gpu/info - REST-method to use

REST methods and responses

/gpu/info

Get current GPU usage statistics

Request example

POST /rest-api/gpu/info HTTP/1.1
Host: localhost:8081
Content-Type: application/json

Response example

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json

{
  "cudaVersion": "12.4",
  "nvencVersion": "12.2",
  "driverVersion": "550.90.07",
  "nvmlVersion": "12.550.90.07",
  "numOfAvailableDevices": 1,
  "numOfDecodingSessions": 1,
  "numOfEncodingSessions": 5,
  "numOfHostEncodingSessions": 5,
  "deviceList": [
    {
      "name": "Tesla P100-PCIE-16GB",
      "pciBusId": "0000:02:00.0",
      "computeCapability": "6.0",
      "computeMode": 0,
      "computeModeDescription": "Default compute mode (Multiple contexts allowed per device)",
      "numOfDecodingSessions": 1,
      "numOfEncodingSessions": 5,
      "numOfHostEncodingSessions": 5,
      "fpsStats": {
        "decoderFps": 30,
        "encoderFps": 30
      },
      "nativeEncoderStats": {
        "averageFps": 29,
        "averageLatency": 1225
      },
      "utilizationState": {
        "computeUtilization": 1,
        "memUtilization": 0,
        "decoderUtilization": 5,
        "encoderUtilization": 4,
        "totalMem": "16 GB",
        "freeMem": "15.26 GB",
        "usedMem": "752.25 MB",
        "usedPciBandwidth": "984.76 KB"
      },
      "supportedEncodeCodecs": {
        "H265": {
          "minWidth": 65,
          "minHeight": 33,
          "maxWidth": 4096,
          "maxHeight": 4096
        },
        "H264": {
          "minWidth": 33,
          "minHeight": 17,
          "maxWidth": 4096,
          "maxHeight": 4096
        }
      },
      "supportedDecodeCodecs": {
        "H265": {
          "minWidth": 144,
          "minHeight": 144,
          "maxWidth": 4096,
          "maxHeight": 4096
        },
        "H264": {
          "minWidth": 48,
          "minHeight": 16,
          "maxWidth": 4096,
          "maxHeight": 4096
        }
      }
    }
  ]
}

Return codes

Code	Reason
200	OK
500	Internal server error

Parameters

Parameter	Description	Example
cudaVersion	CUDA version	12.4
nvencVersion	NVENC version	12.2
driverVersion	NVIDIA drivers version	550.90.07
nvmlVersion	NVIDIA management library version	12.550.90.07
numOfAvailableDevices	GPU available count	1
numOfDecodingSessions	Decoders count	1
numOfEncodingSessions	Encoders count	5
numOfHostEncodingSessions	Encoders count per server	5
name	GPU name	Tesla P100-PCIE-16GB
pciBusId	GPU PCI bud Id	0000:02:00.0
computeCapability	GPU capabilities Id	6.0
computeMode	GPU compute mode	0
computeModeDescription	GPU compute mode description	Default compute mode (Multiple contexts allowed per device)
decoderFps	Decoding FPS	30
encoderFps	Encoding FPS	30
averageFps	Average FPS by native GPU stats	29
averageLatency	Average latency in microseconds by native GPU stats	1225
computeUtilization	GPU compute utilization percent	1
memUtilization	Memory utilization percent	0
decoderUtilization	Decoder utilization percent	5
encoderUtilization	Encoder utilization percent	4
totalMem	Total memory amount	16 GB
freeMem	Free memory amount	15.26 GB
usedMem	Used memory amount	752.25 MB
usedPciBandwidth	PCI bus bandwidth used	984.76 KB
minWidth	Minimal picture width to decode/encode	33
minHeight	Minimal picture height to decode/encode	17
maxWidth	Maximum picture width to decode/encode	4096
maxHeight	Maximum picture height to decode/encode	4096