ORIGINAL ARTICLE |
|
Year : 2022 | Volume
: 12
| Issue : 2 | Page : 122-126 |
|
Using classification and K-means methods to predict breast cancer recurrence in gene expression data
Mohammadreza Sehhati1, Mohammad Amin Tabatabaiefar2, Ali Haji Gholami3, Mohammad Sattari4
1 Medical Image and Signal Processing Research Center, Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran 2 Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences; Pediatric Inherited Diseases Research Center, Research Institute for Primordial Prevention of Non Communicable Disease, Isfahan University of Medical Sciences, Isfahan, Iran 3 Department of Hematology-Oncology, Isfahan University of Medical Sciences, Isfahan, Iran 4 Health Information Technology Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
Correspondence Address:
Mohammad Sattari Health Information Technology Research Center, Isfahan University of Medical Sciences, Isfahan Iran
 Source of Support: None, Conflict of Interest: None
DOI: 10.4103/jmss.jmss_117_21
|
|
Background: Breast cancer is a type of cancer that starts in the breast tissue and affects about 10% of women at different stages of their lives. In this study, we applied a new method to predict recurrence in biological networks made from gene expression data. Method: The method includes the steps such as data collection, clustering, determining differentiating genes, and classification. The eight techniques consist of random forest, support vector machine and neural network, randomforest + k-means, hidden markov model, joint mutual information, neural network + k-means and suportvector machine + k-menas were implemented on 12172 genes and 200 samples. Results: Thirty genes were considered as differentiating genes which used for the classification. The results showed that random forest + k-means get better performance than other techniques. The two techniques including neural network + k-means and random forest + k-means performed better than other techniques in identifying high risk cases. Conclusion: Thirty of 12,172 genes are considered for classification that the use of clustering has improved the classification techniques performance.
|
|
|
|
[FULL TEXT] [PDF]* |
|
 |
|