Application of Data Mining Classification in Predicting Length of Stay for an Inpatient in Hospital
Mr. Nikhil Salvithal (nnsalvithal@coe.sveri.ac.in), Assistant Professor, Department of Computer Science & Engineering, S.V.E.R.I.’s College of Engineering, Pandharpur
Nowadays hospitals are facingissues with severely limited resourcesincluding bedsto holdadmitted patients. To overcome this if length of stay for an inpatient can be predicted, prediction of LOS will help the hospital administration for better planning and management of hospital resources. Predicting length of stay (LOS) for an inpatient in an hospital is a challenging task but is beneficial for the operational success of a hospital. This will help hospitals in achieving good profit, patients care and process efficiency. LOS is an important measure of health care utilization and determinant of hospitalization costs. Health care organizations are interested in early and accurate LOS predictions for both economic and organizational reasons.
For this laboratory test results from the electronic discharge summaries can be used for predicting LOS .These tests are performed on the patient on the first day during his admission.Data mining is process of finding useful patterns in data. Classification is popular data mining technique which predicts target class for each record in given dataset. Classification algorithm needs training dataset of sample records as input, & it generates classification rules as its output.
In Data Preparation, the medical data have to be collected and integrated to give a uniform view of the data. Only collecting does not work out the data must be filtered and prepared on which some useful work can be carried on. Data modelling involves analysing the factors affecting the length of stay and then weighs each parameter accordingly. After setting the parameters, model architecture is developed which is trained and tested with the help of the data samples.
Clinical dataset consists of parameters which include patient physiological measurements, demographic details and lab test results like age, temperature, pulse, BP, Respiratory Rate, PH, Sodium, Urea, Glucose, Bilirubin, Albumin, Creatinine, Urine, WBC, HCT. Training dataset can be used as input to train classifier.
Sample Dataset:
Age | 38 | 46 |
Temperature | AFEBRILE | AFEBRILE |
Pulse | 84 | 82 |
BP | 130/80 | 130/100 |
Respiratory Rate | 25 | 34 |
Pa02 | 80 | 69 |
PH | 1 | 3 |
Sodium | 134 | 133 |
Glucose | 99 | 97 |
Creatinine | 0.9 | 0.8 |
Urea | 19 | 22 |
Urine | 1899 | 399 |
Albumin | 4.8 | 4.6 |
Bilirubin | 1.8 | 0.6 |
HCT | 35.5 | 41.5 |
WBC | 25 | 19.9 |
Acute Physiologic Assessment and Chronic Health Evaluation (APACHE) is one of the most commonly used scoring systems to grade the severity of illness in critically ill patients. APACHE was introduced in 1985. It generatesa point score ranging from 0 to 71 based on 12 physiologic variables, age, and underlying health. Many other systems exist, including the 2nd Simplified Acute Physiology Score (SAPS II) and several mortality probability models.APACHE (“Acute Physiology and Chronic Health Evaluation “) is a severity of disease classification system, one of several ICU scoring systems. After admission of a patient to an intensive care unit, an integer score from 0 to 71 is computed based on several measurements; higher scores imply a more severe disease and a higher risk of death. The point score is calculated from 12 routine physiological measurements (such as blood pressure, body temperature, heart rate etc.) during the first 24hours after admission, information about previous health status and some information obtained at admission (such as age). The resulting point score should always be interpreted in relation to the illness of the patient. Researchers have used APACHE scoring system to assign the score value to the actual parameter value. APACHE score assigns 0 for normal value or range, and large number for value that deviates to large extent from normal value.
Sample dataset with actual values and its corresponding APACHE scores:
Parameter | Actual Value | APACHE Score value |
Age | 55 | 5 |
Temperature | AFEBRILE | 0 |
Pulse | 110 | 5 |
BP | 110/70 | 0 |
Respiratory Rate | 25 | 0 |
Pa02 | 80 | 0 |
PH | 1 | 3 |
Sodium | 131 | 2 |
Glucose | 79 | 0 |
Creatinine | 1.4 | 0 |
Urea | 65 | 11 |
Urine | 1899 | 4 |
Albumin | 3.3 | 0 |
Bilirubin | 3.7 | 6 |
HCT | 35.6 | 3 |
WBC | 25 | 5 |
For ease of analysis, the duration of stay variable can be categorized into three groups for example: 0-7 days, 7-14 days and 15-30 days. These three groups form the output class for classification.WEKA is one of the popular & free data mining software tool with various classifiers. This provides great help for conducting classification experiment. Researchers have investigated four data mining techniques: Multilayer back propagation NN, Naive Bayes Classifier, K-NN method, J48 class of C4.5 decision tree provided by Weka machine learning environment.
Classification technique | Correctly Classified instances (accuracy) |
MLP | 87.8 % |
Naives Bayes | 85.8% |
K-NN | 62.6% |
J48 | 75.1% |
In this way experiment found out that MLP i.e. multilayer perceptron algorithm has a much better performance than the other three techniques. Hence it can be said that MLP classifier is best for predicting length of stay for an inpatient.
References:
1.Comparison of different data mining techniques to predict hospital length of stay, Dr. U. Dinesh Acharya, Shailesh K R,JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL SCIENCES (JPBMS), 2011, 7 (15)
2. Acute Physiology and Chronic Health Evaluation(APACHE)IV: hospital mortality assessment for today’s critically ill patients. Zimmerman, JE, et al., et al. 2006, Crit Care Medicine, pp. 1297-310.
3. Healthcare Data Mining: Prediction of Inpatient Length of Stay. Liu, Peng, et al., et al. 2006. 3rd International IEEE Conference on Intelligent Systems. pp. 832-837.
4.Evaluating Performance of Data Mining Classification Algorithm in Weka: Nikhil N. Salvithal, Dr. R. B. Kulkarni,International Journal of Application or Innovation in Engineering & Management (IJAIEM), volume 2, issue 10 2013 pp. 273-281
5. http://weka.wikispaces.com/Primer
Comments
Post a Comment