Video Multi-Method Assessment Fusion (VMAF)¶
Module Interface¶
- class torchmetrics.video.vmaf.VideoMultiMethodAssessmentFusion(features=False, **kwargs)[source]
Calculates Video Multi-Method Assessment Fusion (VMAF) metric.
VMAF is a full-reference video quality assessment algorithm that combines multiple quality assessment features such as detail loss, motion, and contrast using a machine learning model to predict human perception of video quality more accurately than traditional metrics like PSNR or SSIM.
The metric works by:
Converting input videos to luma component (grayscale)
Computing multiple elementary features: - Additive Detail Measure (ADM): Evaluates detail preservation at different scales - Visual Information Fidelity (VIF): Measures preservation of visual information across frequency bands - Motion: Quantifies the amount of motion in the video
Combining these features using a trained SVM model to predict quality
Note
This implementation requires you to have vmaf-torch installed: https://github.com/alvitrioliks/VMAF-torch. Install either by cloning the repository and running
pip install .
or withpip install torchmetrics[video]
.As input to
forward
andupdate
the metric accepts the following input:As output of
forward
andcompute
the metric returns the following outputvmaf
(Tensor
):If
features
is False, returns a tensor with shape (batch, frame) of VMAF score for each frame in each video. Higher scores indicate better quality, with typical values ranging from 0 to 100.If
features
is True, returns a dictionary where each value is a (batch, frame) tensor of the corresponding feature. The keys are:‘integer_motion2’: Integer motion feature
‘integer_motion’: Integer motion feature
‘integer_adm2’: Integer ADM feature
‘integer_adm_scale0’: Integer ADM feature at scale 0
‘integer_adm_scale1’: Integer ADM feature at scale 1
‘integer_adm_scale2’: Integer ADM feature at scale 2
‘integer_adm_scale3’: Integer ADM feature at scale 3
‘integer_vif_scale0’: Integer VIF feature at scale 0
‘integer_vif_scale1’: Integer VIF feature at scale 1
‘integer_vif_scale2’: Integer VIF feature at scale 2
‘integer_vif_scale3’: Integer VIF feature at scale 3
‘vmaf’: VMAF score for each frame in each video
- Parameters:
features¶ (
bool
) – If True, all the elementary features (ADM, VIF, motion) are returned along with the VMAF score in a dictionary. This corresponds to the output you would get from the VMAF command line tool with the--csv
option enabled. If False, only the VMAF score is returned as a tensor.kwargs¶ (
Any
) – Additional keyword arguments, see Advanced metric settings for more info.
- Raises:
RuntimeError – If vmaf-torch is not installed.
ValueError – If
features
is not a boolean.
Example
>>> import torch >>> from torchmetrics.video import VideoMultiMethodAssessmentFusion >>> # 2 videos, 3 channels, 10 frames, 32x32 resolution >>> preds = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(42)) >>> target = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(43)) >>> vmaf = VideoMultiMethodAssessmentFusion() >>> torch.round(vmaf(preds, target), decimals=2) tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800], [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]]) >>> vmaf = VideoMultiMethodAssessmentFusion(features=True) >>> vmaf_dict = vmaf(preds, target) >>> vmaf_dict['vmaf'].round(decimals=2) tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800], [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]]) >>> vmaf_dict['integer_adm2'].round(decimals=2) tensor([[0.4500, 0.4500, 0.3600, 0.4700, 0.4300, 0.3600, 0.3900, 0.4100, 0.3700, 0.4700], [0.4200, 0.3900, 0.4400, 0.3700, 0.4500, 0.3900, 0.3800, 0.4800, 0.3900, 0.3900]])
Functional Interface¶
- torchmetrics.functional.video.vmaf.video_multi_method_assessment_fusion(preds, target, features=False)[source]
Calculates Video Multi-Method Assessment Fusion (VMAF) metric.
VMAF is a full-reference video quality assessment algorithm that combines multiple quality assessment features such as detail loss, motion, and contrast using a machine learning model to predict human perception of video quality more accurately than traditional metrics like PSNR or SSIM.
The metric works by:
Converting input videos to luma component (grayscale)
Computing multiple elementary features: - Additive Detail Measure (ADM): Evaluates detail preservation at different scales - Visual Information Fidelity (VIF): Measures preservation of visual information across frequency bands - Motion: Quantifies the amount of motion in the video
Combining these features using a trained SVM model to predict quality
Note
This implementation requires you to have vmaf-torch installed: https://github.com/alvitrioliks/VMAF-torch. Install either by cloning the repository and running pip install . or with pip install torchmetrics[video].
- Parameters:
preds¶ (
Tensor
) – Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].target¶ (
Tensor
) – Video tensor of shape (batch, channels, frames, height, width). Expected to be in RGB format with values in range [0, 1].features¶ (
bool
) – If True, all the elementary features (ADM, VIF, motion) are returned along with the VMAF score in a dictionary. This corresponds to the output you would get from the VMAF command line tool with the –csv option enabled. If False, only the VMAF score is returned as a tensor.
- Return type:
- Returns:
If features is False, returns a tensor with shape (batch, frame) of VMAF score for each frame in each video. Higher scores indicate better quality, with typical values ranging from 0 to 100.
If features is True, returns a dictionary where each value is a (batch, frame) tensor of the corresponding feature. The keys are:
’integer_motion2’: Integer motion feature
’integer_motion’: Integer motion feature
’integer_adm2’: Integer ADM feature
’integer_adm_scale0’: Integer ADM feature at scale 0
’integer_adm_scale1’: Integer ADM feature at scale 1
’integer_adm_scale2’: Integer ADM feature at scale 2
’integer_adm_scale3’: Integer ADM feature at scale 3
’integer_vif_scale0’: Integer VIF feature at scale 0
’integer_vif_scale1’: Integer VIF feature at scale 1
’integer_vif_scale2’: Integer VIF feature at scale 2
’integer_vif_scale3’: Integer VIF feature at scale 3
’vmaf’: VMAF score for each frame in each video
Example
>>> import torch >>> from torchmetrics.functional.video import video_multi_method_assessment_fusion >>> # 2 videos, 3 channels, 10 frames, 32x32 resolution >>> preds = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(42)) >>> target = torch.rand(2, 3, 10, 32, 32, generator=torch.manual_seed(43)) >>> vmaf_score = video_multi_method_assessment_fusion(preds, target) >>> torch.round(vmaf_score, decimals=2) tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800], [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]]) >>> vmaf_dict = video_multi_method_assessment_fusion(preds, target, features=True) >>> # show a couple of features, more features are available >>> vmaf_dict['vmaf'].round(decimals=2) tensor([[ 9.9900, 15.9000, 14.2600, 16.6100, 15.9100, 14.3000, 13.5800, 13.4900, 15.4700, 20.2800], [ 6.2500, 11.3000, 17.3000, 11.4600, 19.0600, 14.9300, 14.0500, 14.4100, 12.4700, 14.8200]]) >>> vmaf_dict['integer_adm2'].round(decimals=2) tensor([[0.4500, 0.4500, 0.3600, 0.4700, 0.4300, 0.3600, 0.3900, 0.4100, 0.3700, 0.4700], [0.4200, 0.3900, 0.4400, 0.3700, 0.4500, 0.3900, 0.3800, 0.4800, 0.3900, 0.3900]])