Publications

CVPR 2026 Highlight ★

How to Take a Memorable Picture? Empowering Users with Actionable Feedback

Francesco Laiti , Davide Talon , Jacopo Staiano , Elisa Ricci

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv Code Website
Abstract

Image memorability, i.e., how likely an image is to be remembered, has traditionally been studied in computer vision either as a passive prediction task, with models regressing a scalar score, or with generative methods altering the visual input to boost the image likelihood of being remembered. Yet, none of these paradigms supports users at capture time, when the crucial question is how to improve a photo memorability. We introduce the task of Memorability Feedback (MemFeed), where an automated model should provide actionable, human-interpretable guidance to users with the goal to enhance an image future recall. We also present MemCoach, the first approach designed to provide concrete suggestions in natural language for memorability improvement (e.g., “emphasize facial expression,” “bring the subject forward”). Our method, based on Multimodal Large Language Models (MLLMs), is training-free and employs a teacher-student steering strategy, aligning the model internal activations toward more memorable patterns learned from a teacher model progressing along least-to-most memorable samples. To enable systematic evaluation on this novel task, we further introduce MemBench, a new benchmark featuring sequence-aligned photoshoots with annotated memorability scores. Our experiments, considering multiple MLLMs, demonstrate the effectiveness of MemCoach, showing consistently improved performance over several zero-shot models. The results indicate that memorability can not only be predicted but also taught and instructed, shifting the focus from mere prediction to actionable feedback for human creators.

ICPR 2024

Conditioned Prompt-Optimization for Continual Deepfake Detection

Francesco Laiti , Benedetta Liberatori , Thomas De Min , Elisa Ricci

27th International Conference on Pattern Recognition (ICPR)

arXiv Code
Abstract

The rapid advancement of generative models has significantly enhanced the realism and customization of digital content creation. The increasing power of these tools, coupled with their ease of access, fuels the creation of photorealistic fake content, termed deepfakes, that raises substantial concerns about their potential misuse. In response, there has been notable progress in developing detection mechanisms to identify content produced by these advanced systems. However, existing methods often struggle to adapt to the continuously evolving landscape of deepfake generation. This paper introduces Prompt2Guard, a novel solution for exemplar-free continual deepfake detection of images, that leverages Vision-Language Models (VLMs) and domain-specific multimodal prompts. Compared to previous VLM-based approaches that are either bounded by prompt selection accuracy or necessitate multiple forward passes, we leverage a prediction ensembling technique with read-only prompts. Read-only prompts do not interact with VLMs internal representation, mitigating the need for multiple forward passes. Thus, we enhance efficiency and accuracy in detecting generated content. Additionally, our method exploits a text-prompt conditioning tailored to deepfake detection, which we demonstrate is beneficial in our setting. We evaluate Prompt2Guard on CDDB-Hard, a continual deepfake detection benchmark composed of five deepfake detection datasets spanning multiple domains and generators, achieving a new state-of-the-art.

MDPI 2023

Identifying Synthetic Faces through GAN Inversion and Biometric Traits Analysis

Cecilia Pasquini , Francesco Laiti , Davide Lobba , Giovanni Ambrosi , Giulia Boato , Francesco De Natale

MDPI Applied Sciences

DOI
Abstract

In the field of image forensics, notable attention has been recently paid toward the detection of synthetic contents created through Generative Adversarial Networks (GANs), especially face images. This work explores a classification methodology inspired by the inner architecture of typical GANs, where vectors in a low-dimensional latent space are transformed by the generator into meaningful high-dimensional images. In particular, the proposed detector exploits the inversion of the GAN synthesis process: given a face image under investigation, we identify the point in the GAN latent space which more closely reconstructs it; we project the vector back into the image space, and we compare the resulting image with the actual one. Through experimental tests on widely known datasets (including FFHQ, CelebA, LFW, and Caltech), we demonstrate that real faces can be accurately discriminated from GAN-generated ones by properly capturing the facial traits through different feature representations. In particular, features based on facial landmarks fed to a Support Vector Machine consistently yield a global accuracy of above 88% for each dataset. Furthermore, we experimentally prove that the proposed detector is robust concerning routinely applied post-processing operations.

STAR 2023

Meta-Trainer: An Augmented Reality Trainer for Home Fitness with Real-Time Feedback

Lorenzo Orlandi , Giulia Martinelli , Francesco Laiti , Davide Lobba , Niccolò Bisagno , Nicola Conci

IEEE International Workshop on Sport, Technology and Research (STAR)

Code DOI
Abstract

Meta-Trainer allows people to train on their own, supervised by a virtual trainer that provides exercise samples and real-time feedback. The system relies on a set of smart glasses for Augmented Reality (AR), to enable users to interact with the virtual trainer in a hybrid environment. Given a target exercise, the virtual trainer demonstrates the movements to be reproduced. Differently from video-based applications, the user can move around the virtual trainer so as to look at the exercise from different angles for a more immersive and realistic experience. The AR glasses can also track head movements, which are then used to give feedback and track the training progress. This system offers an advancement compared to video-only trainers, not requiring the user to look at a screen while performing the exercise.