Challenge #3 README: DocGraph Provider Community Identification

In this challenge, your team will use clustering methods and network science methods to identify “teams” of Medicare providers linked to each other by shared patients in the state of New York State (NYS).  You will use a subset of a publicly available Medicare data sets that contain only NYS providers (doctors, organizations, nurse practitioners, physician’s assistants, etc.).  You will need to link this data set to another file that contains information about each provider specialty as well as, a table containing geolocation coordinates of the provider at the zip code level.  You should identify closely linked teams of providers, and figure out their characteristics (team composition, etc.) and any other interesting features of how they are connected.  To accomplish this challenge, you will also need to display your results using innovative graphics. The team with the most elegant methods and visualizations will win a $100 prize.


  1. Docgraph File:  This is a subset of the entire United Sates file created by Centers for Medicare & Medicaid Services (CMS) in response to a Freedom if Information (FOI) request by Fred Trotter at DocGraph. This Challenge_3_referral_data.txt (compressed) data set contains which New York State (NYS) providers shared patients in 2013. This is a way of identifying physician teaming and looking at how patterns of physician groups or teams, work together.
  2. NYS Provider Identifier File:  CMS has developed the National Plan and Provider Enumeration System (NPPES) to assign unique identifiers to health care providers. These National Provider Identifiers or NPI Numbers are required for reimbursing healthcare providers for services from CMS. The (compressed) Challenge_3_npi_lookup_data.txt data set is a NYS subset of the National file and contain all of the FOIA-disclosable data for active and deactivated healthcare providers.
  3. Provider Specialty Codes: The provider identifier file contains codes for provider specialty, this Challenge_3_Prov_Taxonomy_1_1_13.csv data set is a crosswalk for the declared specialty of providers in the above 2 tables.
    1. See the Challenge_3_READ_ME.doc table for a description of the above tables.

 Judging Criteria:

Teams then need to upload their readmission prediction results by 12:00 PM to the link you will be given.

 Teams will be judged based on three criteria –   (1) Visualization (2) Presentation and (3) Deep Magic

Criterion #1:  Visualization –    At the end of the competition, you will submit a single visualization of provider teaming for New York state.  These can be submitted as static high resolution graphics, in HTML or other interactive formats, as Apps, or iPython notebooks.  Visualizations will be judged on clarity, esthetics, informativeness, and innovation.

Criterion #2:  Presentation –   You will put together a 1 + 5 slide presentation using the RocHackHealth Competition Template, and be judged on the clarity of your presentation of your method, findings, and conclusions.

Criterion #3:  Deep Magic –   How cuspy, wizardly, and beautiful your solution is.

Data Download Links:

Challenge_3_referral_data.txt (compressed)
Challenge_3_npi_lookup_data.txt (compressed)

Presentation Template:

The final presentation format is a 1 + 5 PowerPoint format template. Only the first 5 data slides will be included in the judging.