Video Multi-Method Assessment Fusion (VMAF)

Module Interface

class torchmetrics.video.vmaf.VideoMultiMethodAssessmentFusion(features=False, **kwargs)[source]

Calculates Video Multi-Method Assessment Fusion (VMAF) metric.

VMAF is a full-reference video quality assessment algorithm that combines multiple quality assessment features such as detail loss, motion, and contrast using a machine learning model to predict human perception of video quality more accurately than traditional metrics like PSNR or SSIM.

The metric works by:

  1. Converting input videos to luma component (grayscale)

  2. Computing multiple elementary features: - Additive Detail Measure (ADM): Evaluates detail preservation at different scales - Visual Information Fidelity (VIF): Measures preservation of visual information across frequency bands - Motion: Quantifies the amount of motion in the video

  3. Combining these features using a trained SVM model to predict quality

Note

This implementation requires you to have vmaf-torch installed: https://github.com/alvitrioliks/VMAF-torch. Install either by cloning the repository and running pip install . or with pip install torchmetrics[video].

As input to forward and update the metric accepts the following input:

  • preds (Tensor): Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].

  • target (Tensor): Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].

As output of forward and compute the metric returns the following output vmaf (Tensor):

  • If features is False, returns a tensor with shape (batch, frame) of VMAF score for each frame in each video. Higher scores indicate better quality, with typical values ranging from 0 to 100.

  • If features is True, returns a dictionary where each value is a (batch, frame) tensor of the corresponding feature. The keys are:

    • ‘integer_motion2’: Integer motion feature

    • ‘integer_motion’: Integer motion feature

    • ‘integer_adm2’: Integer ADM feature

    • ‘integer_adm_scale0’: Integer ADM feature at scale 0

    • ‘integer_adm_scale1’: Integer ADM feature at scale 1

    • ‘integer_adm_scale2’: Integer ADM feature at scale 2

    • ‘integer_adm_scale3’: Integer ADM feature at scale 3

    • ‘integer_vif_scale0’: Integer VIF feature at scale 0

    • ‘integer_vif_scale1’: Integer VIF feature at scale 1

    • ‘integer_vif_scale2’: Integer VIF feature at scale 2

    • ‘integer_vif_scale3’: Integer VIF feature at scale 3

    • ‘vmaf’: VMAF score for each frame in each video

Parameters:
  • features (bool) – If True, all the elementary features (ADM, VIF, motion) are returned along with the VMAF score in a dictionary. This corresponds to the output you would get from the VMAF command line tool with the --csv option enabled. If False, only the VMAF score is returned as a tensor.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Raises:

Example

>>> import torch
>>> from torchmetrics.video import VideoMultiMethodAssessmentFusion
>>> # 2 videos, 3 channels, 10 frames, 32x32 resolution
>>> preds = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(42))
>>> target = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(43))
>>> vmaf = VideoMultiMethodAssessmentFusion()
>>> torch.round(vmaf(preds, target), decimals=2)
tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800],
        [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]])
>>> vmaf = VideoMultiMethodAssessmentFusion(features=True)
>>> vmaf_dict = vmaf(preds, target)
>>> vmaf_dict['vmaf'].round(decimals=2)
tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800],
        [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]])
>>> vmaf_dict['integer_adm2'].round(decimals=2)
tensor([[0.4500, 0.4500, 0.3600, 0.4700, 0.4300, 0.3600, 0.3900, 0.4100, 0.3700, 0.4700],
        [0.4200, 0.3900, 0.4400, 0.3700, 0.4500, 0.3900, 0.3800, 0.4800, 0.3900, 0.3900]])
compute()[source]

Compute final VMAF score.

Return type:

Union[Tensor, Dict[str, Tensor]]

update(preds, target)[source]

Update state with predictions and targets.

Return type:

None

Functional Interface

torchmetrics.functional.video.vmaf.video_multi_method_assessment_fusion(preds, target, features=False)[source]

Calculates Video Multi-Method Assessment Fusion (VMAF) metric.

VMAF is a full-reference video quality assessment algorithm that combines multiple quality assessment features such as detail loss, motion, and contrast using a machine learning model to predict human perception of video quality more accurately than traditional metrics like PSNR or SSIM.

The metric works by:

  1. Converting input videos to luma component (grayscale)

  2. Computing multiple elementary features: - Additive Detail Measure (ADM): Evaluates detail preservation at different scales - Visual Information Fidelity (VIF): Measures preservation of visual information across frequency bands - Motion: Quantifies the amount of motion in the video

  3. Combining these features using a trained SVM model to predict quality

Note

This implementation requires you to have vmaf-torch installed: https://github.com/alvitrioliks/VMAF-torch. Install either by cloning the repository and running pip install . or with pip install torchmetrics[video].

Parameters:
  • preds (Tensor) – Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].

  • target (Tensor) – Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].

  • features (bool) – If True, all the elementary features (ADM, VIF, motion) are returned along with the VMAF score in a dictionary. This corresponds to the output you would get from the VMAF command line tool with the –csv option enabled. If False, only the VMAF score is returned as a tensor.

Return type:

Union[Tensor, Dict[str, Tensor]]

Returns:

  • If features is False, returns a tensor with shape (batch, frame) of VMAF score for each frame in each video. Higher scores indicate better quality, with typical values ranging from 0 to 100.

  • If features is True, returns a dictionary where each value is a (batch, frame) tensor of the corresponding feature. The keys are:

    • ’integer_motion2’: Integer motion feature

    • ’integer_motion’: Integer motion feature

    • ’integer_adm2’: Integer ADM feature

    • ’integer_adm_scale0’: Integer ADM feature at scale 0

    • ’integer_adm_scale1’: Integer ADM feature at scale 1

    • ’integer_adm_scale2’: Integer ADM feature at scale 2

    • ’integer_adm_scale3’: Integer ADM feature at scale 3

    • ’integer_vif_scale0’: Integer VIF feature at scale 0

    • ’integer_vif_scale1’: Integer VIF feature at scale 1

    • ’integer_vif_scale2’: Integer VIF feature at scale 2

    • ’integer_vif_scale3’: Integer VIF feature at scale 3

    • ’vmaf’: VMAF score for each frame in each video

Example

>>> import torch
>>> from torchmetrics.functional.video import video_multi_method_assessment_fusion
>>> # 2 videos, 3 channels, 10 frames, 32x32 resolution
>>> preds = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(42))
>>> target = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(43))
>>> vmaf_score = video_multi_method_assessment_fusion(preds, target)
>>> torch.round(vmaf_score, decimals=2)
tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800],
        [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]])
>>> vmaf_dict = video_multi_method_assessment_fusion(preds, target, features=True)
>>> # show a couple of features, more features are available
>>> vmaf_dict['vmaf'].round(decimals=2)
tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800],
        [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]])
>>> vmaf_dict['integer_adm2'].round(decimals=2)
tensor([[0.4500, 0.4500, 0.3600, 0.4700, 0.4300, 0.3600, 0.3900, 0.4100, 0.3700, 0.4700],
        [0.4200, 0.3900, 0.4400, 0.3700, 0.4500, 0.3900, 0.3800, 0.4800, 0.3900, 0.3900]])