Please enable JavaScript in your browser.

fltech - 富士通研究所の技術ブログ

富士通研究所の研究員がさまざまなテーマで語る技術ブログ

Introducing Fujitsu KG Enhanced RAG (4 sessions) #4 Fujitsu KG Enhanced RAG for VA (Vision Analytics)

Hello. This is Ikai from the Artificial Intelligence Laboratory.

To promote the use of generative AI at enterprises, Fujitsu has developed an "Enterprise-wide Generative AI Framework Technology" that can flexibly respond to diverse and changing corporate needs and easily comply with the vast amount of data held by a company and laws and regulations. The framework was successively launched in July 2024 as part of Fujitsu Kozuchi (R&D)'s AI service lineup.

Some of the challenges that enterprise customers face when leveraging specialized generative AI models include:

  • Difficulty handling large amounts of data required by the enterprise
  • Generative AI cannot meet cost, response speed, and various other requirements
  • Requirement to comply with corporate rules and regulations

To address these challenges, the framework consists of the following technologies:

  • Fujitsu Knowledge Graph Enhanced RAG ( *1 )
  • Generative AI Amalgamation Technology
  • Generative AI Audit Technology

In this series, we introduce the "Fujitsu Knowledge Graph Enhanced RAG" every week. We hope this helps you to solve your problems. At the end of the article, we'll also tell you how to try out the technology.

Fujitsu Knowledge Graph Enhanced RAG Technology Overcomes the Weakness of Generative AI that Cannot Accurately Reference Large-Scale Data

Existing RAG techniques for making generative AI refer to related documents, such as internal documents, have the challenge of not accurately referencing large-scale data. To solve this problem, we have developed Fujitsu Knowledge Graph Enhanced RAG (hereinafter, Fujitsu KG Enhanced RAG) technology that can expand the amount of data that can be referred to by LLM from hundreds of thousands to millions of tokens to more than 10 million tokens by developing existing RAG technology and automatically creating a knowledge graph that structures a huge amount of data such as corporate regulations, laws, manuals, and videos owned by companies. In this way, knowledge based on relationships from the knowledge graph can be accurately fed to the generative AI, and logical reasoning and output rationale can be shown.

This technology consists of four technologies, depending on the target data and the application scene.

  1. Root Cause Analysis (Now Showing)
    This technology creates a report on the occurrence of a failure based on system logs and failure case data, and suggests countermeasures based on similar failure cases.
  2. Question & Answer (Now Showing)
    This technology makes it possible to conduct advanced Q&A based on a comprehensive view of a large amount of document data such as product manuals.
  3. Software Engineering (Now Showing)
    This technology not only understands source code, but also generates high-level functional design documents, summaries, and enables modernization.
  4. Vision Analytics (This article)
    This technology can detect specific events and dangerous actions from video data, and even propose countermeasures.

In this article, I will introduce 4. Vision Analytics in detail.

What is the Fujitsu KG Enhanced RAG for VA (Vision Analytics)?

Generative AI such as ChatGPT can now understand the content of user-specified video data, summarizing the content and answering questions. It's a great technological advance, but when it comes to answering questions about a few minutes of video, it's hard to imagine how to use it in a business setting. A few minutes of video doesn't take much time for people to watch, and many videos, such as video messages and training videos, are usually filmed and edited with the viewer in mind, so there isn't much need for summaries.

On the other hand, there are also businesses that deal with unedited, unsummarized, long-duration video over months, such as surveillance cameras installed in warehouses and factories, and security cameras in commercial facilities. These unedited, ultra-long videos are meant to be viewed in the event of an accident or incident. However, because (1) video data is unstructured data that cannot be easily analyzed, and (2) it is too large and expensive to analyze manually, it has not been fully utilized for data analysis.

But what if an AI could memorize all the content of a months-long video, and then analyze and respond to that information by simply asking a chat question?

"Please tell me the scene of the accident risk from the surveillance camera images in the warehouse for the last 3 months."

"What were the workers doing before and after the dangerous behavior? Take a tally."

"How did the cashier crowd change before and after the layout change two months ago?"

"Please quantify the time period when the rest area is crowded by day of the week."

and so on, we can extract various useful information and use it for business.

The Fujitsu KG Enhanced RAG for VA that Fujitsu is developing is a technology that enables this new use of video.

Behavioral Knowledge Graph - Transforming Ultra-Long Video into Structured Data

Next, I will briefly introduce the technical details.

The key technology to enable the generative AI to understand ultra-long video of several months is the technology to convert unstructured video data into structured graph data. Graph data is not a typical tabular data format. Instead, it is a format in which data is recorded as a network of nodes and edges that indicate their connected relationships. Graph data is said to be suitable for recording relationships and similarities between data.

Fujitsu has developed video analysis technologies such as Fujitsu Kozuchi for Vision and Actlyzer, and has been able to easily recognize complex human behavior using learned models of about 100 types of basic human actions and behavior recognition rules created with no-code UI. The behavioral knowledge graph is a graph in which people, objects, actions, time, places, etc. recognized by Fujitsu Kozuchi for Vision are represented as nodes in a graph, and the relationships between them are expressed in a graph format, and the scenes in the video are saved as data that can be analyzed.

For example, a video in a warehouse records an event where a worker and a forklift truck get abnormally close. At that time, the behavioral knowledge graph shown in the above figure is generated from the video. The video scene is represented in the form of compact graph data, including the action of the person approaching the forklift, the person and the forklift approaching, the place, the time, and the situation before and after.

In actual operation, in addition to the actions of approaching the forklift truck, the system recognizes various actions in real time, such as the contents of operations in the warehouse, and can convert into graph data and save all scenes that are seen by the constantly operating camera as needed. As the analysis continues for many months, the number of graph nodes increases, but by using a dedicated graph database (hereinafter, graph DB), analysis can be performed at an order of magnitude faster than video data

For example, in order to investigate the situation when an action with an accident risk occurs in detail, consider collecting statistics by extracting the action performed before and after the person who performed the action with an accident risk and the action target. In the graph DB, starting from the node of "the action with the accident risk," the processing to trace the action that occurred successively or the node of the action target is carried out, but this processing is not good for a normal tabular database (relational DB). In one study, a graph DB was more than 1000 times faster than a relational DB (in 1.4 seconds) when traversing nodes up to a depth of 4 in graph data of 1 million nodes. The graph DB took 2.1 seconds to get to depth 5, while the relational DB never finished. By converting video data into graph data and accumulating it, we can analyze the situation before and after the action that occurred and related things, and can use it in more advanced ways such as investigating the cause and devising countermeasures.

This technology has made it possible to convert extremely long videos, which humans could not watch or remember, into data that can be easily analyzed. A large amount of video that could not be used effectively until now will be transformed into a valuable behavioral knowledge graph full of knowledge.

Fujitsu KG Enhanced RAG Technology for Retrieving Knowledge from Graph Data

Next, I'll show you how to extract the data you want to know from the behavioral knowledge graph.

If you are an expert in analyzing graph data, you can use tools to extract knowledge from behavioral knowledge graphs and apply it directly to your business. However, analyzing graph data requires some specialty and experience. Fujitsu has developed Fujitsu KG Enhanced RAG for VA, a generative AI technology for enterprises that enables the analysis of behavioral knowledge graphs by generative AI and the retrieval of knowledge through a natural language interface using chat.

For example, a user might ask the generative AI in a chat, "From the surveillance camera video in the warehouse for the last 3 months, please tell me the scene where there was an accident risk." The generative AI performs the following processing based on this question.

  • Step1. Examine the business information knowledge graph (a knowledge graph that summarizes business information of individual companies) to find out what kind of events are "an accident risk" and extract them as concrete actions.
  • Step2. The actions related to the "accident risk event" are extracted from the behavioral knowledge graph and fed to the generative AI as input of RAG.
  • Step3. The generative AI analyzes and aggregates based on the input behavioral knowledge graph and presents the results to the user via chat. For example, "Based on the surveillance camera footage of the warehouse over the last three months, the incidents with the risk of accidents are as follows. Approaching forklifts: 85 cases, ・・・ ”

In other words, a system that "An AI that memorizes all of surveillance footage over the course of months will instantly analyze data and respond to user questions." can be realized by combining the technology that converts ultra-long videos into behavioral knowledge graphs with the technology that the generative AI extracts and analyzes graph data.

Use cases for this system include:

  • Investigating the causes of hazardous behavior in the workplace: for example, ask "What did the worker and the forklift do in the three minutes before and after the case in which the worker got too close to the forklift? Take a tally in all cases." By statistically grasping the situation before and after the dangerous action, the cause can be investigated and the safety of the site can be improved.
  • Statistical understanding of sales floor conditions at retail stores and countermeasures: For example, ask "The congestion situation in front of the cashier is summed up by day of the week and by time." This can be reflected in staff allocation plans to improve the comfort of sales floors.

Other applications include improving safety in public areas such as stations and airports, improving production efficiency through work analysis at factories, and investigating trouble detection and prevention measures at large-scale commercial facilities.

To realize this system, in addition to the above-mentioned ultra-long video graphing technology, the following advanced technology to link knowledge graph and LLM is used.

  • Technology that generates appropriate queries to the graph DB based on natural language queries from the user.
  • Technology for deriving an appropriate answer to a user's question by extracting a related graph and inputting the information of the graph into an LLM.

As further developments, research is under way at the Fujitsu Macquarie AI Research Lab, one of the Fujitsu Small Research Labs (SRL), for example, coaching technology that can provide AI-powered safety guidance, and personalization technology that enables individual responses rather than giving the same answers to everyone.

You can also read more about the Fujitsu Macquarie AI Research Lab here.

To try out the Fujitsu KG Enhanced RAG for VA technology

Fujitsu KG Enhanced RAG for VA was registered as one of the lineups of Fujitsu Kozuchi (R&D) in November 2024.

If you are interested in using this technology in your business, please contact us via the "Contact Us" section of this site.

Acknowledgement

The technology and demo app in this article were developed by the following members. I would like to take this opportunity to introduce.

  • Fujitsu Australia Limited: Shun Takeuchi
  • Fujitsu Research - Artificial Intelligence Laboratory - Human Reasoning CPJ: Takashi Honda, Yoshiaki Ikai, Sosuke Yamao, Jun-ya Saito, Takahiro Saito, Arisu Endo, Yu Takahashi, Guillaume Pelat, Noriko Itani, Shingo Hioronaka, Takashi Kikuchi, Yuki Harazono, Issei Inoue, Natsuki Miyahara, Yukio Hirai, Jun-ya Fujimoto, Yoshihisa Asayama, Satoru Takahashi, Toshio Endo, Akihiko Yabuki.

*1:RAG technology Retrieval Augmented Generation. A technology that combines and extends the capabilities of generative AI with external data sources.