Catherine Pelachaud: Director of Research French National Centre for Scientific Research
She is a French computer scientist who specializes in human-computer interaction. Specifically, her work focuses on Embodied Conversational Agents (ECAs) and the development of models for socio-emotional and non-verbal behavior of agents, including facial expressions, gaze, and gestures.
Prof. Pelachaud is well-known for her work on the interactive ECA platform Greta that allows controlling multimodal behaviors of an ECA in real-time.
Below is a transcript of the Q&A session
When creating a computational model which is an abstraction over data, is there a risk that we are universalizing human behavior? In a sense, it becomes less variable, less interesting. Do you think this is an issue, or not at all?
It’s a very tough question, it’s not some simple answer. On the one hand, you want to equip the agent with more and more socio-emotional capabilities, also knowledge. But another thing is that for some applications, you want to have a long term interaction, and you want to build rapport, which is quite difficult to do if you don’t equip them with identity features. When you have an identity, it implies quite a lot, such that you have to work on the eyes, from this agent to another agent. I am not talking about the appearance, but the way it moves, or the way it responds. And that is being studied by several other colleagues, on capturing the style of a person. Quite a lot of work has been done and it only scratches the surface of what is being needed to be simulated – individuality in the agent. I don’t think you need this type of features for any direction, but it could be interesting to have such features for some specific applications.
So basically, you can capture individuality?
We are working towards that, it’s not that we can capture it, but we are working towards it.
Thank you for an interesting talk. I have been struggling with something. On the one hand, there are these approaches like Stefan Kopp’s approach, Justine Cassell’s, your approach, where you work on the understanding of the mechanisms and let the mechanisms generate the agent. And on the other hand, you have an entirely AI approach, where things automatically look more natural, but you absolutely have no insight into the mechanisms. Now I think 10 years ago or so, the first approach would automatically work best, because there was not really an alternative, and now if you look at what happens in the industry, the AI approach looks the most impressive. So, I would like to know from you, where do you see the direction of human-computer, human-agent interaction move to? Is that the AI approach where it looks fantastic, but you have no idea what is happening, or is it a more cognitive science approach where you have a thorough insight into the mechanisms?
I haven’t shown in the presentation, but we are working on the machine learning approach of developing facial expressions, prosody, head movement, and hand gestures for the agent. Sometimes you need to bridge those two types of approaches. For example, capturing hand shape using a purely machine learning approach without any information on the semantics you want to convey so far hasn’t seen very good results. It could be that this changes in a few years and I am wrong, the field is going so fast and one is able to develop models that could do that. At the moment, incorporating the symbolic approach into the machine learning approaches could be very prominent. To compute a behavior or an utterance for an agent a lot of models are being developed these days, but you want an agent in an interaction, and there, if you don’t have models of the goals and beliefs of the agent – a cognitive model of the agent and the user’s mental states – it is going to be very difficult to have an interaction taking place. I think you need more work. For example, in the piece of work that I presented, this adaptation mechanism is based on machine learning – you learn from data and you see how it should adapt. I didn’t mention it, but this model learned that the agent should from time to time arrive at an imitation phase. And during an interaction we put participants in front of the virtual agent that was human-sized, presented on a large screen. Participants had to interact with the virtual agent and the agent did not move any longer. The agent learned to have an equilibrium in the behaviors because participants were not showing many behaviors either. During the interaction the agent stopped moving and we were like ‘What do you mean? We spent so many years developing these models and here is what we got!’. It was not enough; we need to equip the agent with some reasoning. So, I think both approaches are very important and also understanding where you are going with the interaction to make sure the interaction is going in the direction you want.
What are some of the ethical implications of incorporating touch modality into AI along with the integration of other modalities?
Yes, so this is why I was mentioning that you have to take into account this notion of touch avoidance. You can see in human-human interaction, if you are interacting with a person and you want to touch them, you have to come closer to them and enter their intimate space. You can see through gaze, posture that the person has anticipated you are coming closer to them, or you may touch them and they may avoid that. With the virtual agent technology, it is not so easy to capture the behavior difference, but we absolutely do need to take it into account to ensure that we are not going against some users’ preferences. The agent has to always adapt to the user. So, it is not only touch, but also gazing behaviors, the agent has to take users into account.
Regarding the social touch, you have to take into account so many things, context, gender, cultural differences, what people are used to and so on. I was curious, how do people perceive agents, would they want to touch them? How likely are people to be those who initiate that?
When we first started working on this, we had a previous device and we wanted to see how people would perceive the type of touch that was simulated. We brought them in a cave, a 3D virtual environment, and they were interacting with the agent, recognizing different types of touch. And that was with students and we had really positive feedback. But again, that was not a real interaction, there was no goal, except for being touched and they were playing quite a lot. Some participants really wanted to hug a virtual agent, which was not possible. If you embed the agent in the cave, the agent is much more present, it really has an effect. So, this is something we want to study more.
A deep dive into the topic:
Lugrin, B. (2021). Introduction to Socially Interactive Agents. In The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition (pp. 1-20). [link to chapters]