I have always stood out for my responsibility, perseverance, excellent academic results, and good technical and research skills. In this way, I have achieved several academic recognitions, among which Erasmus Mundus scholarship and ANID scholarship awarded by the Chilean government to do a PhD in computer science. Currently, I am PhD Candidate at IALab group PUC, passionate about computer vision. I’m trying to figure out how to give the machines and artificial agents the ability to reason and see the world as human beings do. Inspired by this, I am focusing on learning to recognize novel human actions with few samples and multimodal information like humans do.
Download my resumé.
PhD in Computer Science, (March 2019 - Present)
Pontificia Universidad Católica de Chile
Bs. Electronic Engineering, (February 2012 - March 2017)
Universidad del Norte, Barranquilla
Academic Exchange, (September 2015 - July 2016)
Universidad Politécnica de Madrid
Main topics:
Main topics:
Functions:
Functions:
We introduce PIVOT, a novel method that leverages the extensive knowledge in pre-trained models from the image domain, thereby reducing the number of trainable parameters and the associated forgetting. Unlike previous methods, ours is the first approach that effectively uses prompting mechanisms for continual learning without any in-domain pre-training. Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
vCLIMB is a standardized test-bed to analyze catastrophic forgetting of deep models in video continual learning. We perform in-depth evaluations of existing CL methods in vCLIMB, and observe two unique challenges in video data. The selection of instances to store in episodic memory is performed at the frame level. Second, untrimmed training data influences the effectiveness of frame sampling strategies.
In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations.
In this paper, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task.
We develop TTIR, a contextual recommender model derived from the Transformer neural architecture that suggests a set of items to every team member, based on the contexts of teams and roles that describe the match. TTIR outperforms several approaches and provides interpretable recommendations through visualization of attention weights.