Video summarization aims to extract representative frames to retain high-level information. Increasing concerns about privacy issues have been raised because conventional large-scale training requires users to upload video samples that may inevitably release sensitive information. In this paper, we thoroughly discuss the Federated Video Summarization problem, i.e., how to obtain a robust video summarization model when video data is distributed on private data islands. Our key contribution includes 1) We propose a fundamental Frame-Based aggregation method to video-related tasks, which differs from the sample-based aggregation in conventional FedAvg. 2) To mitigate the heterogeneous distribution due to community diversity, we propose the Community-Aware Clustering Federated Video Summarization Framework (CFed-VS) that clusters clients via a novel data-driven clustering approach. 3) We further tackle the challenging non-IID setting with a proposed Mixture Transformer, which manifests state-of-the-art performance via extensive quantitative and qualitative experiments on TVSum and SumMe datasets.
cite
@INPROCEEDINGS{10191101,
author={Wan, Fan and Wang, Junyan and Duan, Haoran and Song, Yang and Pagnucco, Maurice and Long, Yang},
booktitle={2023 International Joint Conference on Neural Networks (IJCNN)},
title={Community-Aware Federated Video Summarization},
year={2023},
volume={},
number={},
pages={1-8},
keywords={Training;Privacy;Federated learning;Neural networks;Distributed databases;Benchmark testing;Transformers;Federated Learning;Vision Transformer;Video Summarization},
doi={10.1109/IJCNN54540.2023.10191101}}