Application of Data Mining Classification in Predicting Length of Stay for an Inpatient in Hospital

Mr. Nikhil Salvithal (nnsalvithal@coe.sveri.ac.in), Assistant Professor, Department of Computer Science & Engineering, S.V.E.R.I.’s College of Engineering, Pandharpur

Nowadays hospitals are facingissues with severely limited resourcesincluding bedsto holdadmitted patients. To overcome this if length of stay for an inpatient can be predicted, prediction of LOS will help the hospital administration for better planning and management of hospital resources. Predicting length of stay (LOS) for an inpatient in an hospital is a challenging task but is beneficial for the operational success of a hospital. This will help hospitals in achieving good profit, patients care and process efficiency. LOS is an important measure of health care utilization and determinant of hospitalization costs. Health care organizations are interested in early and accurate LOS predictions for both economic and organizational reasons.

For this laboratory test results from the electronic discharge summaries can be used for predicting LOS .These tests are performed on the patient on the first day during his admission.Data mining is process of finding useful patterns in data. Classification is popular data mining technique which predicts target class for each record in given dataset. Classification algorithm needs training dataset of sample records as input, & it generates classification rules as its output.

In Data Preparation, the medical data have to be collected and integrated to give a uniform view of the data. Only collecting does not work out the data must be filtered and prepared on which some useful work can be carried on. Data modelling involves analysing the factors affecting the length of stay and then weighs each parameter accordingly. After setting the parameters, model architecture is developed which is trained and tested with the help of the data samples.

Clinical dataset consists of parameters which include patient physiological measurements, demographic details and lab test results like age, temperature, pulse, BP, Respiratory Rate, PH, Sodium, Urea, Glucose, Bilirubin, Albumin, Creatinine, Urine, WBC, HCT. Training dataset can be used as input to train classifier.

Sample Dataset:

Age	38	46
Temperature	AFEBRILE	AFEBRILE
Pulse	84	82
BP	130/80	130/100
Respiratory Rate	25	34
Pa02	80	69
PH	1	3
Sodium	134	133
Glucose	99	97
Creatinine	0.9	0.8
Urea	19	22
Urine	1899	399
Albumin	4.8	4.6
Bilirubin	1.8	0.6
HCT	35.5	41.5
WBC	25	19.9

Acute Physiologic Assessment and Chronic Health Evaluation (APACHE) is one of the most commonly used scoring systems to grade the severity of illness in critically ill patients. APACHE was introduced in 1985. It generatesa point score ranging from 0 to 71 based on 12 physiologic variables, age, and underlying health. Many other systems exist, including the 2nd Simplified Acute Physiology Score (SAPS II) and several mortality probability models.APACHE (“Acute Physiology and Chronic Health Evaluation “) is a severity of disease classification system, one of several ICU scoring systems. After admission of a patient to an intensive care unit, an integer score from 0 to 71 is computed based on several measurements; higher scores imply a more severe disease and a higher risk of death. The point score is calculated from 12 routine physiological measurements (such as blood pressure, body temperature, heart rate etc.) during the first 24hours after admission, information about previous health status and some information obtained at admission (such as age). The resulting point score should always be interpreted in relation to the illness of the patient. Researchers have used APACHE scoring system to assign the score value to the actual parameter value. APACHE score assigns 0 for normal value or range, and large number for value that deviates to large extent from normal value.

Sample dataset with actual values and its corresponding APACHE scores:

Parameter	Actual Value	APACHE Score value
Age	55	5
Temperature	AFEBRILE	0
Pulse	110	5
BP	110/70	0
Respiratory Rate	25	0
Pa02	80	0
PH	1	3
Sodium	131	2
Glucose	79	0
Creatinine	1.4	0
Urea	65	11
Urine	1899	4
Albumin	3.3	0
Bilirubin	3.7	6
HCT	35.6	3
WBC	25	5

For ease of analysis, the duration of stay variable can be categorized into three groups for example: 0-7 days, 7-14 days and 15-30 days. These three groups form the output class for classification.WEKA is one of the popular & free data mining software tool with various classifiers. This provides great help for conducting classification experiment. Researchers have investigated four data mining techniques: Multilayer back propagation NN, Naive Bayes Classifier, K-NN method, J48 class of C4.5 decision tree provided by Weka machine learning environment.

Classification technique	Correctly Classified instances (accuracy)
MLP	87.8 %
Naives Bayes	85.8%
K-NN	62.6%
J48	75.1%

In this way experiment found out that MLP i.e. multilayer perceptron algorithm has a much better performance than the other three techniques. Hence it can be said that MLP classifier is best for predicting length of stay for an inpatient.

References:

1.Comparison of different data mining techniques to predict hospital length of stay, Dr. U. Dinesh Acharya, Shailesh K R,JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL SCIENCES (JPBMS), 2011, 7 (15)

2. Acute Physiology and Chronic Health Evaluation(APACHE)IV: hospital mortality assessment for today’s critically ill patients. Zimmerman, JE, et al., et al. 2006, Crit Care Medicine, pp. 1297-310.

3. Healthcare Data Mining: Prediction of Inpatient Length of Stay. Liu, Peng, et al., et al. 2006. 3rd International IEEE Conference on Intelligent Systems. pp. 832-837.

4.Evaluating Performance of Data Mining Classification Algorithm in Weka: Nikhil N. Salvithal, Dr. R. B. Kulkarni,International Journal of Application or Innovation in Engineering & Management (IJAIEM), volume 2, issue 10 2013 pp. 273-281

5. http://weka.wikispaces.com/Primer

SVERI Blogs

Search This Blog

Application of Data Mining Classification in Predicting Length of Stay for an Inpatient in Hospital

Application of Data Mining Classification in Predicting Length of Stay for an Inpatient in Hospital

Comments

Post a Comment