sub contents
MENU

News

News 1

ETRI Unveils Ultra-Fast Generative Visual Intelligence Model: Creates Images in Just 2 Seconds

- Unveiled a high-speed generative visual intelligence model, five times faster compared to DALL-E.
- Released three types of image generation models and two types of conversational visual-language models.

ETRI researchers have unveiled a technology that combines generative AI and visual intelligence to create images from text inputs in just 2 seconds, propelling the field of ultra-fast generative visual intelligence.

Electronics and Telecommunications Research Institute (ETRI) announced the release of five types of models to the public.(1) These include three models of ‘KOALA,’(2) which generate images from text inputs five times faster than existing methods, and two conversational visual-language models ‘Ko-LLaVA(3)’ which can perform question-answering with images or videos.

The ‘KOALA’ model significantly reduced the parameters(5) from 2.56B (2.56 billion) of the public SW model(4) to 700M (700 million) using the knowledge distillation technique.(6) A high number of parameters typically means more computations, leading to longer processing times and increased operational costs. The researchers reduced the model size by a third and improved the generation of high-resolution images to be twice as fast as before and five times faster compared to DALL-E 3.

ETRI has managed to reduce the model’s size(7) considerably and increase the generation speed to around 2 seconds, enabling its operation on low-cost GPUs with only 8GB of memory amidst the competitive landscape of text-to-image generation both domestically and internationally.

ETRI’s three ‘KOALA’ models, developed in-house, have been released in the HuggingFace(8) environment.

In practice, when the research team input the sentence “a picture of an astronaut reading a book under the moon on Mars,” ETRI-developed KOALA 700M model created the image in just 1.6 seconds, significantly faster than Kakao Brain’s Kallo (3.8 seconds), OpenAI’s DALL-E 2 (12.3 seconds), and DALL-E 3 (13.7 seconds).
(1) DALL-E 3, unveiled by OpenAI
(2) https://huggingface.co/spaces/etri-vilab/KOALA
(3) https://huggingface.co/spaces/etri-vilab/Ko-LLaVA
(4) Lightweight Stable Diffusion Model: Stable Diffusion XL (SDXL)
(5) Parameters play a role akin to the ‘synapses’ (junctions between neurons) in the brain, with larger numbers correlating to higher performance. 2.56B signifies 2.56 billion parameters.
(6) Knowledge Distillation technique: A method for model compression, involving the transfer of information from a larger model to a smaller one.
(7)Model sizes: 1.7B (Large), 1B (Base), 700M (Small)
(8) Hugging Face: An ecosystem facilitating the easy sharing, deployment, usage, and training of machine learning technologies, particularly deep learning models.

Lee Youngwan, Senior Researcher
Visual Intelligence Research Section
(+82-42-860-1126, yw.lee@etri.re.kr)

News 2

ETRI Unveils AI Analysis Service Platform at International E-sports Tournament

- Achieves independency on game vendors and scalability through analysis of game screens.
- Attains an 87% accuracy rate in match predictions, receiving significant acclaim when applied to the international League of Legends (LoL) tournament.

ETRI’s researchers have developed an AI-powered e-sports analysis platform that provides real-time win rate prediction services by analyzing gameplay screens. This platform was notably applied to the highly popular League of Legends (LoL) during a recent international e-sports tournament, garnering positive feedback.

Electronics and Telecommunications Research Institute (ETRI) has developed a technology that recognizes real-time game situations by analyzing play elements extracted from game videos and automatically generates highlights by identifying key play events in the game.

Also, this e-sports service platform, based on AI, not only records gamer profiles from gameplay data but also suggests corresponding play strategies. Overcoming dependence on traditional game developers, paves the way for expansion across various game genres, significantly aiding in the creation of new services and commercialization.

Unlike previous services that were limited to commentary-focused broadcasting due to restricted access to game developer APIs, the research team has developed technology that provides various predictive information in addition to key gameplay indicators through real-time game screen analysis.

Sang-Kwang Lee, Principal Researcher
Immersive Interaction Research Section
(+82-42-860-6159, sklee@etri.re.kr)

News 3

ETRI Achieves World-Leading Edge 3Gbps with 5G Small Cell Technology

- Delivers 3Gbps downlink speed based on dual connectivity and 800Mbps uplink speeds
- Key technologies Aiming for Global technology commercialization

ETRI has developed a ‘5G Small Cell Base Station SW’ capable of 5G new radio dual connectivity (5G NR-DC) technology that allows mobile devices to utilize both mid-band (3.5GHz) and mmWave frequencies (28GHz) to provide improved network coverage and data rate. It is expected to be of great help in the spread of 5G private networks in enterprises, such as smart factories and defense networks by providing gigabit-level download speed services.

5G NR dual connectivity using mid-band and mmWave frequencies is critical to delivering multi-Gigabit speeds and the massive capacity required for 5G consumers and enterprise applications such as smart factories, smart buildings and so on. In a trial conducted by ETRI researchers, they aggregated 400MHz of mmWave spectrum (28GHz) and 100MHz of mid-band spectrum (3.5GHz) to reach downlink speeds of up to 3Gpbs on an individual mobile device.

Despite securing a large portion of the spectrum for 5G mobile access in the mmWave bands, the waveform propagation at these frequencies is much shorter than the sub-6GHz implementations. In addition, the directional nature of mmWave waveforms can also be blocked, causing disruptions to the link. Therefore, most 5G mmWave deployments still dominated by Non-StandAlone (NSA) architecture which uses LTE as the anchor for the control plane, with the user plane flowing directly to the EPC (4G) or 5G core network. ETRI highlighted that this 5G SA NR-DC technology which combines mid-band and mmWave frequencies, providing a more stable link for the control plane on mid-band frequencies and enabling high-performance, latency-sensitive applications on mmWave frequencies.

Jee-Hyeon Na, Director
Intelligent Small Cell Research Section
(+82-42-860-5408, jhna@etri.re.kr)

ETRI Webzine Vol.75 FEBRUARY