Challenge #1 README:  Predicting Hospital Readmissions

In this challenge, your team will try to predict which patients will be re-admitted to the hospital after being discharged from a hospital stay.  This is a real problem for hospitals, who don’t get paid for a re-admission if it happens <30 days after the patient was discharged.  There are a million companies trying to develop algorithms to do this, so here is your chance to show your stuff.  You will be given a training data set of several files, and will need to assemble them and train your algorithm.  You are free to use any algorithm you wish that your team codes, or combine several.  On Sunday at 9:00 AM, we will release the validation data set (which does not tell you who got re-admitted) and you will run your algorithm and submit your predictions.  The team with the most accurate predictions, lowest false-negative rate, best presentation of their results and method, and best dark magic approach wins $100.

Data Description:

  1. Training Set:  This file (Challenge_1_Training.Set.csv) contains 70,000 HIPAA compliant de-identified records of hospital admissions.  Each record contains a random and one-way scrambled unique identifier, limited demographics (age, gender), type of admission,  discharge disposition (e.g. home, to a skilled nursing facility, home with assistance, transfer to another facility), if the person was re-admitted, and the number of days from the relative to 30 that the re-admission occurred. The (READ_ME.doc) file contains each field definition, as well as additional definitions for Admission Type (admission_type_id.csv), Discharge Disposition (discharge_disposition_id.csv), and Admission Source (admission_source_id.csv)
  2. Look-up tables for ICD-9 Diagnosis:  This zipped files (version 32) from Center for Medicare Services web site contains two tables with the ICD-9 Diagnosis (CMS32_DESC_LONG_DX.txt) and also Procedure (CMS32_DESC_LONG_SG.txt)  Codes.  The ICD-9 Diagnosis tables provide a description of the numerical Diagnosis codes contained in the Challenge_1_Training.Set.csv file.  You can use this file if you want to understand the codes and/or deepen your analysis of re-admission causes.

Validation Data Set for Competition:

 On Sunday at 9:00 AM we will post the Readmission Challenge File Set (Challenge_1_Validation_Set.csv).  Each team needs to download the files and run their algorithms.

Teams then need to upload their readmission prediction results by 12:00 PM to the link you will be given in tab delimited form (.txt files) with the following simple structure:

ID <tab> PRED <tab> TIME

where PRED = {0=not readmitted, 1=readmitted} and TIME = {0 = not readmitted, 1 = readmitted <30 days after discharge, 2 = readmitted > 30 days after discharge}.

Judging Criteria:

 Teams will be judged based on three criteria: (1) Prediction – How close your prediction came to the true readmission values, along with false positive and false negative rates.  You want to have a low false negative rate. (2) Presentation  (3) Deep Magic – did you use 

Criterion #1:  Predictive Accuracy –  We will then compare your results with the actual data and calculate an accuracy (how close you are to the actual readmission count), a false positive (you predict readmission, but actually not re-admitted) and false negative (you predict not readmitted, but actually was re-admitted).

Criterion #2:  Presentation –   You will put together a 1 + 5  slide presentation using the RocHackHealth Competition Template, and be judged on the clarity of your presentation

Criterion #3:  Deep Magic –   How cuspy, wizardly, and beautiful your solution is.

Data Download Links:

 Training Set:

READ_ME.doc
Challenge_1_Training.Set.csv
admission_type_id.csv
discharge_disposition_id.csv
admission_source_id.csv
Center for Medicare Services web site

 Readmission Challenge File Set:

Challenge_1_Validation_Set.csv

Final Presentation Template:

The final presentation format is a 1 + 5 PowerPoint format template. Only the first 5 data slides will be included in the judging.