A Hotspots Analysis-Relation Discovery Representation Model for Revealing Diabetes Mellitus and Obesity

Guannan He¹, Yanchun Liang^1,2 , Yan Chen³, William Yang⁴, Jun S. Liu⁵, Mary Qu Yang⁶, Renchu Guan^1,*

1 Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
2 Zhuhai Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China
3 Department of Endocrinology, The Second Hospital of Jilin University, Changchun 130000, China
4 Department of Computer Science, Carnegie Mellon University, PA 15213, USA
5 Department of Statistics, Harvard University, Cambridge, MA 02138 USA
6 MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and Univ. of Arkansas Medical Sciences. 2801 S. Univ. Ave., Little Rock, 72204, USA

* Corresponding author, E-mail: guanrenchu@jlu.edu.cn

Abstract

Background: Nowadays, diabetes mellitus and obesity are becoming the most serious public health challenges in the world, which place a huge economic burden on society. To help researchers quickly reveal the complex and close relationships between diabetes mellitus, obesity and related diseases in a large amount of literature, and give them an inspiration to search the effective treatment for these diseases, we propose a novel model named as representative latent Dirichlet allocation topic model (RLDA).
Results: We applied our model to a corpus of more than 337,000 pieces of diabetes- and obesity-related literature published from 2007 to 2016. To discover the meaningful relationships between diabetes mellitus, obesity and other diseases, we performed an explicit analysis on the output of our model with a series of visualization tools. Then, we used clinical reports such as Standards of Medical Care in Diabetes which were not used in our training data to show the credibility of our discoveries. Fortunately, a sufficient number of these reports are matched directly. Our results illustrate that in the last 10 years, for obesity accompanying diseases, scientists and researchers mainly focus on 12 of them, such as asthma, gastric disease, heart disease and so on; for the research of diabetes, a more broad scope features 22 diseases, such as Alzheimer’s disease, heart disease and so forth; for both of them, there are 10 accompanying diseases – depression, anxiety, heart disease, hypertension, hepatitis, liver disease, lung disease, respiratory disease, tuberculosis, myocardial infarction. In addition, the tumor/tnf, hypertension, inflammation and adolescent/child obesity or diabetes will be the hot research topics related to diabetes and obesity in the near future.
Conclusions: With the help of our model, the hotspots analysis-relation discovery results on diabetes mellitus and obesity were achieved. For example, we extracted the significant relationships between them and other diseases such as heart disease, Alzheimer’s disease, tumor and so on. We believe that the new proposed representation learning model can help biomedical researchers better focus their attention and adjust the direction of their work.