Journal of Medical Signals & Sensors

ORIGINAL ARTICLE
Year
: 2022  |  Volume : 12  |  Issue : 4  |  Page : 278--284

A Novel Texture Extraction-Based Compressive Sensing for Lung Cancer Classification


Indrarini Dyah Irawati1, Sugondo Hadiyoso1, Gelar Budiman2, Arfianto Fahmi2, Rohaya Latip3,  
1 School of Applied Science, Telkom University, Bandung, Jawa Barat, Indonesia
2 School of Electrical Engineering, Telkom University, Bandung, Jawa Barat, Indonesia
3 Department of Communication Technology and Networking, Faculty of Computer Science and Information Technology, University Putra Malaysia, Seri Kembangan, Malaysia

Correspondence Address:
Indrarini Dyah Irawati
School of Applied Science, Telkom University, Bandung, Jawa Barat
Indonesia

Abstract

Background: Lung cancer images require large memory storage and transmission bandwidth for sending the data. Compressive sensing (CS), as a method with a statistical approach in signal sampling, provides different output patterns based on information sources. Thus, it can be considered that CS can be used for feature extraction of compressed information. Methods: In this study, we proposed a novel texture extraction-based CS for lung cancer classification. We classify three types of lung cancer, including adenocarcinoma (ACA), squamous cell carcinoma (SCC), and benign lung cancer (N). The classification is carried out based on texture extraction, which is processed in 2 stages, the first stage to detect N and the second to detect ACA and SCC. Results: The simulation results show that two-stage texture extraction can improve accuracy by an average of 84%. The proposed system is expected to be decision support in assisting clinical diagnosis. In terms of technical storage, this system can save memory resources. Conclusions: The proposed two-step texture extraction system combined with CS and K- Nearest Neighbor has succeeded in classifying lung cancer with high accuracy; the system can also save memory storage. It is necessary to examine the complexity of the proposed method so that it can be analyzed further.



How to cite this article:
Irawati ID, Hadiyoso S, Budiman G, Fahmi A, Latip R. A Novel Texture Extraction-Based Compressive Sensing for Lung Cancer Classification.J Med Signals Sens 2022;12:278-284


How to cite this URL:
Irawati ID, Hadiyoso S, Budiman G, Fahmi A, Latip R. A Novel Texture Extraction-Based Compressive Sensing for Lung Cancer Classification. J Med Signals Sens [serial online] 2022 [cited 2023 Jun 4 ];12:278-284
Available from: https://www.jmssjournal.net/text.asp?2022/12/4/278/360839


Full Text



 Introduction



The advancement of medical devices, especially medical imaging, has been accompanied by an increase in sensing quality with greater detail and large data sizes.[1] It requires large memory resources in storage and data transfer. Another critical issue in terms of large memory requirements is the application of telemedicine with a variety of medical instruments connected to the internet in exchange for information.[2] Paying attention to the large image file size can be a new problem in the provision of storage media.[3]

To deal with this problem is to apply compression to reduce data storage capacity while ensuring the quality of the data.[4] Recently, the compressive sensing (CS) technique has received attention to be developed massively to compress information with a high compression ratio compared to other well-established methods. CS will extract significant information in the form of a measurement matrix.[5] Donoho first published CS in 2006.[6] This technique is used for signal sampling at a rate below the Nyquist theorem; the compression rate is higher than the convenience sampling method.[7] Another advantage of CS is that it can simultaneously carry out the acquisition and reconstruction processes.[8]

Instead of compressing and obtaining meaningful information from the source, CS techniques are being developed for feature extraction in classification on signals or image applications.[9],[10],[11] This will be very beneficial because CS has generated valuable information that can be used as a feature vector. Some researchers have proposed CS to simulate disease classification or medical diagnosis on signals or images as a feature extraction method. Studies related to the application of CS in the electrocardiogram (ECG) signal compression provide a reconstruction result similar to the original signal.[12],[13] This can be the basis that CS does not omit signal information. CS combined with machine learning on ECG signal classification was simulated in the study.[14],[15] The results of their study showed high accuracy in the compressed signal. The use of CS in other medical signal classifications such as EEG epileptic detection has been reported in studies.[16],[17],[18] These studies also show that CS delivers high performance in classification applications. The CS approach in image classification has also received attention; instead of compressing the image size, the CS approach is used for feature extraction of the observed images. Recently, a CS simulation in image classification or detection is reported in Jokić and Vuković, Islam et al., and Obermeier and Martinez-Lorenzo.[9],[19],[20] Their study proved that CS is a potential approach for both compression and feature extraction.

The previous study by Irawati et al.[21] proposed a lung cancer image classification algorithm using the CS approach for feature extraction combined with the K- Nearest Neighbor (KNN) for classification. The performance of the proposed method was tested on the Lung and Colon cancer LC25000 dataset. This dataset contains Lung and Colon Histopathological images with a total of 25,000 images consisting of adenocarcinoma (ACA), squamous cell carcinoma (SCC), benign lung cancer (N), and the other two classes are colon cancer cases, with each class being 5000 images. In simulations with three data classes, including ACA, SCC, and N, the proposed method can generate up to 100% accuracy for N classification and 70% for ACA and SCC classification. From this study, there is still a gap in improving accuracy, particularly in the case of the ACA and SCC classifications. Therefore, in this study, a new method for the classification of lung cancer images was developed. The goal of this study is to get a high accuracy system. The novelty of the proposed method is that there is texture extraction at the CS acquisition and CS reconstruction stages. The two resulting feature vectors are then combined to become predictors at the classification stage.

As a reminder, this paper is organized as follows. The proposed methods are presented in section 2. Details of the algorithm are also described in this section. Meanwhile, section 3 explains the dataset used in the study and simulation results, followed by an explanation and discussion. The conclusions of this study are presented in section 4. Section 4 also presents the limitations of this study and the opportunities for future studies.

 Proposed Texture Extraction



In this section, we enhance the research method in Irawati et al.[21] We proposed CS for reducing the number of samples required for medical image diagnosis. The main part of CS is the sparse matrix and the measurement matrix as shown in [Figure 1]. The sparse matrix can be obtained by Discrete Cosine Transform (DCT), Wavelet transform, or Fourier transform. The sparsity process and the measurement matrix will result in data compression. Since the data is a color image, CS reduces the dimensions of lung cancer images consisting of red (R), green (G), and blue (B) intensities. We proposed two-dimensional (2-D) DCT to sparse the image, convert 2-D images to 1-D images and sample the images using a uniform distributed measurement matrix. Dimensional reduction is carried out since the image can be reconstructed using the Orthogonal Matching Pursuit (OMP) reconstruction algorithm.[22] We use DCT as sparsity transform. The original lung image is not sparse. To make the image sparse, we transform the image in block based by two-dimensionality DCT. We convert the sparse image to a one-dimensionality signal and multiply the signal with the measurement rate (MR) to have a compressed signal in the CS domain. In the decoder, we use OMP as the simplest reconstruction method of CS, but it obtains a good performance if the input of the CS signal is sparse.{Figure 1}

These images are processed through two stages of texture extraction, respectively. The extraction is carried out at two stages because the characteristics of the image have different characters from one class to another. The first texture extraction aims to detect all types of N cancer. To get class N only, it takes the CS acquisition output on the color intensity feature on the R layer and the skewness on the G layer to detect it. The color feature extraction becomes stage-1 extraction, which is then detected to retrieve images of type N. The first texture extraction process is carried out after the measurement matrix's acquisition process, which is generated randomly using a uniform distribution. Uniform distribution is simple and effective compared to nonuniform distribution. In the nonuniform distribution, there needs to be a normalization step for the value because the minimum and maximum values or the nonuniform distribution range are not fixed, so it requires another process that adds to the calculation.

The OMP algorithm solves the problem of recovering 1-D images. These 1-D images are reconstructed to be 2-D images. The 2-D inverse DCT block calculates the 2-D inverse discrete cosine transformation of the input signal. The next process is texture extraction Stage-2. After the extraction process and the first stage has detected N, the assumption of the next test data is that it only consists of ACA and SCC. Meanwhile, the second texture extraction was carried out to distinguish between ACA and SCC types and the results were stored in Database 2. The second extraction was carried out after the reconstruction because the feature to distinguish the two classes was not obtained in the CS domain, then the entropy statistical feature at layer G was needed to distinguish these two classes. This feature extraction is based on parameters of intensity,[23] skewness,[24] and entropy.[25] Texture extraction is carried out in the training and testing phase. The classification process of the extracted results is carried out using KNN. The methods developed can be seen in the block diagram as shown in [Figure 1].

The performance of the proposed method is measured by several parameters, including peak signal to noise ratio (PSNR), compression ratio (CR), and accuracy. PSNR is used to compare the quality of the original image with the CS image.[26] CR is a dimensional comparison between the original image and the compressed image.[27] Accuracy is a comparison between the original image and the classified image.

The training algorithms are briefly described using the following steps:

Read a colored training image X with size P × QGenerate random and uniform distributed matrix A with size M × B2Take the first layer of the image as the red layer of the image XrApply block processing to the image producing Xr{i} with block size B × B and i is the ith block of the imageChange the image block into 1-D signal by reshaping the block, this step produces Xr1D with size B2 × 1Apply CS acquisition Yr{i}=AXr1D{i} producing Yr{i} with size M × 1Repeat step 4–6, from i = 1 to e as the end of the block from Xr{i}. Thus, all parts of the image are already compressed producing Yr.Calculate the intensity average of Yr producing Ir.Calculate the skewness of Yr producing Sg.Repeat step 3–6 for green layer of the image Xg, producing Yg.Using block processing Yg{i} is reconstructed by OMP producing block-based image Xtg{i}Combine all block of Xtg{i} from 1 to e producing XtgCalculate the entropy of Xtg producing EgRepeat 1–13 for other training image file. If there are M training image files, there will be Ir(j),Sg(j) and Eg(j), with j= 1 to M.Apply averaging of Ir(j),Sg(j) and Eg(j) for each lung cancer type producing Ir1, Ir2, Ir3, Sg1, Sg2, Sg3, Eg1, Eg2 and Eg3. Index 1 means ACA type, 2 means N type, and 3 means SCC type. Save all the result to the database. Ir and Sg to the DB1, and Eg to the DB2.

While the procedure for the testing phase follows the steps below:

Read a colored testing image X with size P × QLoad the random and uniform distributed matrix A with size M × B2 (same as training)Take the first layer of the testing image as the red layer of the image XsrApply block processing to the image producing Xsr{i} with block size B × B and i is the ith block of the imageChange the image block into 1-D signal by reshaping the block, this step produces Xsr1D with size B2 × 1Apply CS acquisition Ysr{i}=AXsr1D{i} producing Ysr{i} with size M × 1Repeat step 4–6, from i = 1 to e as the end of the block from Xsr{i}. Thus, all part of the image is already compressed producing YsrCalculate the intensity average of Ysr producing IsrCalculate the skewness of Ysr producing IsgCalculate the minimum distance from Isr and Sgr to Ir1, Ir2, Ir3, Sg1, Sg2 and Sg3, as the following equation: dr=arg min(| Isr-Irk|) and dg=arg min(| Ssg-Igk|), where k= 1(ACA),2(N),3(SCC)If dr and dg are 2, apply the decision that the testing image file is NIf dr and dg are not 2, using block processing Ysr{i} is reconstructed by OMP producing block-based image Xstg{i}Combine all blocks of Xstg{i} from 1 to e producing XstgCalculate the entropy of Xstg producing EsgCalculate the minimum distance from Esg to Eg1 and Eg3 as the following equation: dg= arg min(| Esgr-Egk|), where k = 1(ACA) and 3 (SCC)If, dg=1 apply the decision that the testing file is ACA type, and if dg=3, apply the decision that the testing file is SCC typeRepeat steps 1–16 for the next testing fileCalculate the accuracy of the detection result.

 Simulation Results and Analysis



Datasets used in the study

We use the LC25000 dataset to evaluate the performance of the proposed method. The dataset consists of lung and colon histopathological images with 15,000 color images for lung cancer in 3 classes, ACA, SCC, and N.[28] Each class contains 5000 images. All images are saved in a jpeg file format with a size of 768 × 768 pixels. Sample histopathological images of each type are presented in [Figure 2]. We simulate a sample for training and testing to verify the accuracy of the classification method with the proposed texture extraction and sparse representation based on the perspective of training and testing samples, block dimensions, MR, and CR.{Figure 2}

Comparison accuracy of training data number

In this experiment, we set 300 images as testing data, block size, and MR 64. We tested training images 1, 10, 20, 30, 40, and 50. In this simulation, three types of cancer, N, ACA, and SCC, are shown as well as the total which determined the average of accuracy from the three types of cancer. [Figure 3] shows that the increase of the training sample resulted in a gradual increase in the classification accuracy of N and total type. The accuracy of each type is stable after the number of training samples is above 30, with the accuracy for N>92%, ACA >86%, SCC >75% so that the total >84% These results indicate a higher classification accuracy than previous studies[21] in the ACC and SCC classification cases. The detection accuracy obtained in this study is higher than that of several previous studies as reported in Shen et al., Sun et al., and Masood et al.,[29],[30],[31] generating 87.14%, 89.9%, and 89.52% accuracy, respectively, in the classification of lung cancer. However, this study cannot be concluded that is superior to previous studies because the datasets used are different. At least, this proposed method can provide a new perspective that CS can perform both compression and feature extraction purposes{Figure 3}

Comparison accuracy of measurement rate

[Figure 4] shows the comparison of the accuracy from the MRs for three types of lung cancer. Simulation is set on training image 30, testing image 300, and block size 32 × 32. The simulation results show that MR does not affect accuracy results. For a MR >32, the accuracy for N cancer type is above 92%, ACA and total types are above 84%, and SCC is above 75%. In this case, the use of MR 32 is highly recommended, which will affect the compression results.{Figure 4}

Trade-off between peak signal to noise ratio and compression ratio on measurement rate

We show in [Figure 5] the trade-off between PSNR and CR on different MR. The simulation parameters used are 5 training samples and 300 testing samples. The results determine that the increase of MR, the PSNR, and CR is also increased. The increasing MR then CR also increased according to the equation [INSIDE:1].{Figure 5}

There was a significant increase in the PSNR value for measurement rate 16–32.

Trade-off between accuracy, peak signal to noise ratio on block size

[Figure 6] shows the trade-off between accuracy and PSNR on block size. In this experiment, we take five training samples, 300 testing samples, and 6.25% CR. To determine the effect of the block size, we test them a couple of sizes 8, 16, 32, and 64. The block size that makes the classification results stable is the size 16 and above. The stability shows that the accuracy value tends to be constant, not changing significantly. At block size 8, the accuracy value is low and changes significantly. In addition, at the 8 × 8 block size, the reconstructed PSNR value is also much lower than the PSNR for the 64 × 64 and above block sizes; thus, the block size is not recommended.{Figure 6}

[Table 1] shows the comparison of accuracy in previous studies[21] and the proposed method. In research,[21] the sparsity technique used were FFT, IFFT, DWT, and without sparsity scheme, and feature extraction was carried out in one step. Meanwhile, the proposed study uses the DCT sparsity technique and two-stage feature extraction. The average accuracy of the proposed method has a higher value than previous studies because the one-stage extraction feature works to generate features for three types of cancer at once, while the two-stage feature extraction details the features, with the first stage only getting features of type N and at the second stage to obtain characteristics of other types of cancer.{Table 1}

[Table 2] shows the Precision, Recall, and F1-score of the proposed system. The value of precision, recall, and F1-score for the N class is highest than the other classes. The ACA class has the lowest value for precision. At the same time, the SCC class has the lowest score on the parameter recall and F1-score.{Table 2}

 Conclusions



This paper proposed CS for dimension reduction, two-stage texture extraction, and KNN for classification. The CS-based texture extraction developed for lung cancer data classification has improved the accuracy for each cancer class. The proposed system is a combination of sparsity methods with 2-D DCT, uniformly distributed measurement matrix, OMP reconstruction algorithm, and two-stage texture extraction and KNN classification. Texture extraction is based on parameters of intensity, skewness, and entropy. The simulation results show that at the number of training samples 300, the accuracy for type N cancer is above 92%, ACC type is above 86%, and SCC is above 76%. This proposed method has good accuracy and generates higher accuracy for classification than the previous study as well as save storage memory. It is necessary to examine the complexity of the proposed method so that it can be analyzed further.

Acknowledgments

The authors would like to thank Prof. A.V. Senthil Kumar from Master of Computer Applications, Hindusthan College of Arts and Science, India for his comments to improve this paper and the three anonymous reviewers for their insightful suggestions and careful reading of the manuscript.

Financial support and sponsorship

This work is financially supported by RandD Telkom University, under grant No. 529/PNLT3/PPM/2020.

Conflicts of interest

There are no conflicts of interest.

References

1Langer SG. Challenges for data storage in medical imaging research. J Digit Imaging 2011;24:203-7.
2Jain S, Nehra M, Kumar R, Dilbaghi N, Hu T, Kumar S, et al. Internet of Medical Things (IoMT)-integrated biosensors for point-of-care testing of infectious diseases. Biosens Bioelectron 2021;179:113074.
3Dandu RV. Storage media for computers in radiology. Indian J Radiol Imaging 2008;18:287-9.
4Jayasankar U, Thirumal V, Ponnurangam D. A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. J King Saud Univ Comput Inf Sci 2021;33:119-40.
5Arjoune Y, Kaabouch N, El Ghazi H, Tamtaoui A. A performance comparison of measurement matrices in compressive sensing. Int J Commun Syst 2018;31; e3576:1-18. [doi: 10.1002/dac. 3576].
6Donoho DL. Compressed sensing. IEEE Trans Inf Theory 2006;52:1289-306.
7Quinsac C, Basarab A, Kouamé D. Frequency domain compressive sampling for ultrasound imaging. Adv Acoust Vib 2012;2012:1-17.
8Li L, Fang Y, Liu L, Peng H, Kurths J, Yang Y. Overview of compressed sensing: Sensing model, reconstruction algorithm, and its applications. Appl Sci 2020;10:1-19.
9Jokić A, Vuković N. License plate recognition with compressive sensing based feature extraction. Arxiv 2019;abs/1902.05386:1-4.
10Ren K, Du L, Wang B, Li Q, Chen J. Statistical compressive sensing and feature extraction of time-frequency spectrum from narrowband radar. IEEE Trans Aerosp Electron Syst 2020;56:326-42.
11Eleyan A, Kose K, Cetin AE. Image feature extraction using compressive sensing. In: Advance Intelligent System Computing. Switzerland; Springer International Publishing. Vol. 233. 2014.
12Luo K, Cai Z, Du K, Zou F, Zhang X, Li J. A digital compressed sensing-based energy-efficient single-spot bluetooth ECG node. J Healthc Eng 2018;2018:1-11.
13Fira CM, Goras L, Barabasa C, Cleju N. ECG compressed sensing based on classification in compressed space and specified dictionaries. In: European Signal Processing Conference. Barcelona, Spain; IEEE;2011. p. 1573-7.
14Chou CY, Pua YW, Sun TW, Wu AA. Compressed-domain ECG-based biometric user identification using compressive analysis. Sensors (Basel) 2020;20:3279.
15Cheng Y, Hu Y, Hou M, Pan T, He W, Ye Y. Atrial fibrillation detection directly from compressed ECG with the prior of measurement matrix. Inf 2020;11:1-15.
16Mohammad GA, Aghababaei H, O'Toole JM. Detection of epileptic seizures from compressively sensed EEG signals for wireless body area networks. Expert Syst Appl 2021;172:1-17.
17Abualsaud K, Mahmuddin M, Saleh M, Mohamed A. Ensemble classifier for epileptic seizure detection for imperfect EEG data. ScientificWorldJournal 2015;2015:945689.
18Zeng K, Yan J, Wang Y, Sik A, Ouyang G, Li X. Automatic detection of absence seizures with compressive sensing EEG. Neurocomputing 2016;171:497-502.
19Islam SR, Maity SP, Ray AK, Mandal M. Automatic detection of pneumonia on compressed sensing images using deep learning. In: 2019 IEEE Canadian Conference of Electrical and Computer Engineering, CCECE 2019. Edmonton, Canada; IEEE; 2019. p. 1-4. [doi: 10.1109/CCECE.2019.8861969].
20Obermeier R, Martinez-Lorenzo JA. Compressive sensing unmixing algorithm for breast cancer detection. IET Microw Antennas Propag 2018;12:533-41.
21Irawati ID, Hadiyoso S, Fahmi A. Compressive sensing in lung cancer images for telemedicine application. In: 2021 International Conference on Electronics, Communications and Control Engineering, ICECC 2021. Seoul Republic of Korea; ACM Digital Library;2021.
22Zhang H, Xiao S, Zhou P. A matching pursuit algorithm for backtracking regularization based on energy sorting. Symmetry (Basel) 2020;12:1-12.
23Wang Y, Yi Z. Research on image intensity based on Matlab. In: Advances in Intelligent Systems and Computing. Vol. 180. Springer, Berlin, Heidelberg: AISC; 2013. p. 101-7.
24Ma B, Yao J, Le Y, Qin C, Yao H. Efficient Image Noise Estimation Based on Skewness Invariance and Adaptive Noise Injection. Vol. 14. IET London: IET Image Process; 2020. p. 1393-401.
25Sparavigna AC. Entropy in image analysis. Entropy 2020;22:1-4.
26Sara U, Akter M, Uddin MS. Image quality assessment through FSIM, SSIM, MSE and PSNR – A comparative study. J Comput Commun 2019;07:8-18.
27Abd-Elhafiez WM, Gharibi W, Heshmat M. An efficient color image compression technique. Telkomnika Telecommunication Comput Electron Control 2020;18:2371-7.
28Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM. Lung and colon cancer histopathological image dataset. (LC25000). Arxiv 2019; New York, USA; Cornell University.
29Shen W, Zhou M, Yang F, Yu D, Dong D, Yang C, et al. Multi-crop Convolutional Neural Networks for lung nodule malignancy suspiciousness classification. Pattern Recognit 2016;61:663-73.
30Sun W, Zheng B, Qian W. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis. Comput Biol Med 2017;89:530-9.
31Masood A, Sheng B, Li P, Hou X, Wei X, Qin J, et al. Computer-assisted decision support system in pulmonary cancer detection and stage classification on CT images. J Biomed Inform 2018;79:117-28.