Please enable JavaScript in your browser.

fltech - 富士通研究所の技術ブログ

富士通研究所の研究員がさまざまなテーマで語る技術ブログ

Revolutionizing Cancer Genomics: The Power of XAI and Knowledge Graphs

Hello,

I'm Katsuhiko Murakami from the Genome AI Team at the Computing Laboratory.

We've developed the world's first Explainable AI (XAI) for fusion genes, integrating XAI with knowledge graphs. Our research has been published in the cancer journal Cancers (Basel) (Impact Factor 5.2):

"Pathogenicity prediction of gene fusion in structural variations: a knowledge graph-infused explainable artificial intelligence (XAI) framework", K. Murakami, et al. Cancers (Basel), May 2024.

I'd like to introduce this technology. An overview of this work has also been press released by the Institute of Medical Science, University of Tokyo. In this blog, I'll delve a bit deeper into the technical aspects.

The Importance of Fusion Gene Analysis and Explainable AI (XAI) in Cancer Genomic Medicine

Cancer development and progression involve various genetic mutations. Traditionally, the focus has been on single nucleotide variants, deletions, or insertions in specific genes. However, with advancements in whole genome sequencing technology, gene fusion has been recognized as another crucial cause of cancer. Gene fusion occurs when two separate genes abnormally combine to form a new fusion gene. The protein produced by this fusion gene can disrupt normal cellular functions, leading to cancer onset and progression. However, analyzing whether a fusion gene observed in a patient is the cause of cancer is complex, time-consuming, and requires high-level expertise. Thus, developing efficient and accurate analysis methods has become urgent.

One approach to address this challenge is "pathogenicity prediction" using AI to narrow down the analysis to important fusion genes. AI can efficiently process vast amounts of data and support human experts.

However, the introduction of AI brings new challenges. In particular, the importance of explainable AI (XAI) has been emphasized recently. To ensure reliability in medical settings, it's necessary to make AI decision processes transparent and solve the black box problem.

Developing Explainable AI (XAI) Using Knowledge Graphs - Achieving World-Top Prediction Accuracy and Explanations Based on Cancer Development Mechanisms

Prior to our team's work on this problem, Fujitsu had developed the DeepTensor algorithm, an XAI capable of handling graphs and outputting high-accuracy predictions along with their rationale (Maruhashi, Koji et al. "Learning Multi-Way Relations via Tensor Decomposition With Neural Networks". Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018). Our team used this to develop a system that can learn from knowledge graph structures, enabling high-accuracy predictions with explanations (Press Release; Fuji, et al., 2018). Furthermore, we had developed an XAI for pathogenicity prediction of single nucleotide variants, a major mutation type in cancer (Abe et al., 2023).

In this current work, we developed an XAI (the world's first for fusion genes) that can accurately predict the pathogenicity of fusion genes and interpret the contributing features to explain the prediction rationale in text. This required constructing a new knowledge graph for fusion gene analysis. We formed a collaborative research team with world-class medical researchers, selected necessary information for fusion gene analysis such as protein functional domains, and iteratively assessed their usefulness. As a result, we constructed a knowledge graph that effectively represents information for prediction and explanation (Figure 1).

Figure 1. Overview of the explainable AI (XAI) methodology using the knowledge graph and deep tensor (fusion gene version).
Figure 1. Overview of the explainable AI (XAI) methodology using the knowledge graph and deep tensor (fusion gene version).
Top: Pathogenicity prediction function for cancer genome fusion genes. It uses features from the knowledge graph to predict using deep learning. It also outputs features and their contribution as reasons (world's first XAI for fusion genes). Middle: Knowledge graph loaded with basic cancer information and fusion gene information. Bottom: Function to explain the basis of prediction in text. Based on the features in the knowledge graph that contributed to the prediction, it automatically generates an easy-to-understand explanation using a large language model (LLM).

Additionally, we use a Large Language Model (LLM) (GPT-4) to generate textual explanations based on the prediction reasons and related texts. This makes the prediction results easier to understand, even for those without specialized knowledge, enhancing practical applicability. For input to the LLM, we use prompts generated from a pre-prepared template.

The core of our technology is first, our unique knowledge graph used for learning and explanation. This knowledge graph includes protein functional domain information in addition to standard genomic information, which is crucial for linking mutations and disease mechanisms in fusion genes. This not only significantly contributes to improving prediction accuracy but is also used for explanations. Secondly, we achieved a function to explain AI's judgment rationale in natural language by collaborating with a large language model (LLM). This explanation generation mechanism has greatly improved the transparency of AI's decision-making process.

Performance Evaluation of the Developed XAI

World-Top Level Prediction Accuracy on Benchmark Sets

We conducted two evaluation experiments.

In Experiment 1, we performed 10-fold cross-validation on Dataset A (Cosmic Fusion Export (v97) as positive examples, fusion genes observed in normal cells as negative examples). The result showed the same accuracy (98%) as ChimerDriver (world's top in existing technology).

In Experiment 2, we trained on Dataset A and tested on the independent Dataset B. Our technology achieved an F1 score of 84.5%, surpassing ChimerDriver's 83.2%. Considering these results, we can claim world-top level prediction accuracy.

Evaluation of Explainability (Confirmation through Case Studies that Explanations are Consistent with Mechanisms)

We confirmed the effectiveness of the XAI's explanation function using specific fusion gene cases. In the example in Figure 2, for the fusion gene KIF5B::RET, the AI focused on three features (Basis in the lower left), with Pkinase (indicating kinase domain) showing a particularly large number. To make this more understandable, a textual explanation using a large language model is also output (on the right). This explanation part clearly shows the prediction reason and basis along with the cancer progression mechanism, stating "This fusion gene leads to constitutive activation of the kinase domain, resulting in cell proliferation and eventually cancer." These explanations provide information that is easy for experts to understand and clinically significant.

Figure 2. Protein structures of fusion genes of the example (KIF5B::RET), and the explanation with features output from our XAI.
Figure 2. An example of XAI output for the fusion gene KIF5B::RET (prediction score, primary protein structure, and explanation of prediction reasons).
The three panes in light orange, yellow highlight, and the red arrows were manually inserted by the author afterwards as explanations.

Conclusion

This technology has the potential to greatly contribute to cancer genomic medicine. It is expected to not only improve diagnostic accuracy and optimize treatment selection but also contribute to the creation of new research findings. Furthermore, this technology can be applied to other gene-related diseases and serves as a prime example of the importance of explainability in medical AI in general.

However, challenges remain. The variety of cases confirming the effectiveness of explainability is not sufficient, and its general effectiveness is still a question. Moreover, pathogenicity determination for other types of structural abnormalities not involving fusion genes is an important future challenge.

In the near future of genomic medicine, whole genome analysis will become increasingly important. There, not only single nucleotide variants but also structural abnormalities leading to large chromosomal abnormalities will be comprehensively identified. Our technology is expected to contribute significantly in such scenarios.

While addressing the above challenges, we aim to further develop this technology and contribute to the advancement of cancer genomic medicine corresponding to whole genome analysis. The evolution of fusion gene analysis using XAI will be an important step towards realizing more precise and reliable healthcare.