A One-Dimensional PCA Approach for Classifying Imbalanced Data

Derrick KR; Varayini Pankayatselvan

A One-Dimensional PCA Approach for Classifying Imbalanced Data

Abstract

Derrick KR and Varayini Pankayatselvan

Background: Highly complex and computational intensive methods based on Synthetic Minority Over-sampling Technique (SMOTE) and more recently Learning Vector Quantization SMOTE (LVQ-SMOTE) have been proposed for classification problems of imbalanced biomedical data. This works presents a much simpler approach that is not computationally intensive and competes well with existing approaches. It uses principal component analysis (PCA) to generate a pseudo-variable as a linear combination of the features. From this one pseudo-variable, several classification methods are developed that classify directly based on very simple statistics. One method, the Mean Method (MM), classifies cases based on closeness to the means for the two classes from training data sets. When the number of features is very large, a feature reduction (FR) procedure is proposed to reduce misclassifications. In cases where the means of both classes are similar but their spread about their means are different, the Spread Method (SM) is proposed. A unique feature of this method is that one is able to vary the accuracy of classification between the two classes by changing the width of the window for allocation of cases. These proposed methods are found to perform well without the use of over-sampling techniques and multiple-fold cross validation.

Results: The MM or the MM with FR was compared directly to recently published results for LVQ-SMOTE on six (6) data sets and gave better or much better results in every case as measured by adding the percent of true positives to the percent of true negatives. The SM was compared with LVQ-SMOTE on two (2) data sets and operating windows widths were obtained that gave much better results for the SM over LVQ-SMOTE.

Conclusion: Given the simplicity, strengths, and performance of the proposed approach in comparison to current methods, these methods and procedures are recommended for use in classification of imbalanced biomedical data applications.

免責事項: この要約は人工知能ツールを使用して翻訳されており、まだレビューまたは確認されていません

この記事をシェアする

ジャーナルのハイライト

インデックス付き

CAS ソースインデックス (CASSI)
索引コペルニクス
Google スカラー
シェルパ・ロミオ
学術雑誌データベース
Genamics JournalSeek
ジャーナル目次
サイテファクター
電子ジャーナルライブラリ
レフシーク
ハムダード大学
EBSCO A-Z
雑誌の抄録索引作成ディレクトリ
科学雑誌の世界カタログ
OCLC-WorldCat
スカラステア
SWBオンラインカタログ
仮想生物学図書館 (vifabio)
パブロン
DTU ファインディット
ジュネーブ医学教育研究財団

コンピュータサイエンスとシステム生物学のジャーナル