1

PIVOT: Prompting for Video Continual Learning

We introduce PIVOT, a novel method that leverages the extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.

Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, Bernard Ghanem

vCLIMB: A Novel Video Class Incremental Learning Benchmark

vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data. The selection of instances to store in episodic memory is performed at the frame level. Second, untrimmed training data influences the effectiveness of frame sampling strategies.

Andrés Villa, Kumail Alhamoud, Juan León Alcázar, Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations.

Vladimir Araujo, Andrés Villa, Marcelo Mendoza, Marie-Francine Moens, Alvaro Soto

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

In this paper, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task.

Andrés Villa, Juan Manuel Perez Rua, Vladimir Araujo, Juan Carlos Niebles, Victor Escorcia, Alvaro Soto

Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Game

We develop TTIR, a contextual recommender model derived from the Transformer neural architecture that suggests a set of items to every team member, based on the contexts of teams and roles that describe the match. TTIR outperforms several approaches and provides interpretable recommendations through visualization of attention weights.

Andrés Villa, Vladimir Araujo, Francisca Cattan, Denis Parra

Interpretable Contextual Team-aware Item Recommendation: Application in Multiplayer Online Battle Arena Game