Table of Contents
The newly launched EMO AI, the group's artificial intelligence Alibaba, caused great repercussion on the internet. The tool produces videos from photographs, making the avatar sing songs in different languages, with surprising realism.
This AI segment has made significant progress over the years, previously appearing with photo editing, when through apps, it was possible to swap faces with someone, add filters and use other features. Now, the simulation of actions in videos has provoked great interest on the part of Internet users, mainly due to the realism applied to these contents.
What is EMO AI?
A EMO AI: Emote Portrait Alive is a tool for creating audiovisual content through the dissemination of ultra-realistic audio and video — or images.
A IA focuses on accurately and faithfully simulating the model's facial features in situations imaginable from a photograph. It is possible to produce a music video from a person's photograph, where the character can sing a popular song, moving their lips and facial muscles, to pronounce the lyrics correctly with realistic poses.
According to the developers of Institute for Intelligent Computing, from the group Alibaba, the videos created by EMO AI They can have any duration, depending only on the size of the audio file inserted to produce the video.
How does AI work?
Based on the methodology used by professionals, the EMO It only needs an image to serve as a basis for producing the video generated by its artificial intelligence. Then, you must insert an audio file so that the tool can create the video with the character's interpretation, synchronizing the lip movements with the lyrics and animating the person, for as long as the audio file lasts.
See below a video using EMO AI
The video above uses an anime drawing to create an AI-simulated scene. O EMO AI he also managed to satisfactorily produce the character's facial and lip movement in a convincing way.
EMO AI Tool Structure
The methodology used in developing the EMO AI was applied precisely with the purpose of seeking advances in the quality of videos created by AI's. The researchers aimed to find alternative ways to increase the quality of the result.
Because of this, in the initial stage of the process, called Frame Encoding, a neural network called ReferenceNet extracts features from a single reference image by simulating the motion frames. This encoding process lays the foundation of the video.
Then, the audio is incorporated, with the help of an encoder, and facial masks are applied, which will enable realistic facial movements. Completing the process, the mechanism Backbone Network is responsible for preserving the character's identity and adjusting the speed of facial movement.
Despite the innovative process, Alibaba developers report in their scientific article that they found limitations in the model. They cited that the EMO AI it takes longer to produce content than other AI's in the same segment and, in some cases, other parts of the body may appear in the video, such as, for example, the character's hands.
The search for AIs that simulate actions
With the increasing compatibility of AIs with existing operating systems on cell phones, the search for these tools has skyrocketed in recent years. Currently, it is possible to find several application options that change the user's face with a famous one, age it, rejuvenate it, correct facial expressions, among other features.
Among these possibilities, the deepfake, which is the result of facial matching or its replacement using an AI. This resource can be used for different purposes, such as humorous, political or even pornographic. Regarding politics, in Brazil, the TSE turned on the alert on this issue, already foreseeing the use of deepfake in the October elections this year.
In this case, the deepfake It's a full plate for fake news, as they generally portray a candidate in lying situations or simulating controversial speeches, which he never spoke about, motivated by political interests.
EMO AI produces expressions in several languages
Another barrier overcome by AIs, including the EMO AI, is the production of videos in different languages. These technologies understand different languages, the sound of their words and their pronunciations. With this, it is possible to produce audiovisual content in many languages.
Check out more videos generated by EMO AI below
Sources: NowadAls, Arxiv, Humanaigc.
See also:
reviewed by Glaucon Vital in 28 / 2 / 24.
Discover more about Showmetech
Sign up to receive our latest news via email.