ICD9 data are the diagnosis and procedure codes used by insurance companies to categorize, well, diagnoses and procedures to be determined / performed by medical professionals. Typically, if you visit the doctor for an ailment and file insurance, your provider will list the reason for the visit, any diagnoses, and any procedures performed. This is then sent to the insurance company for processing. They can then use codes to indicate if the procedure is covered, as well as, (for example) determine if the diagnosis was a pre-existing condition. The data is a simple hierarchical structure which is shown in the following diagram.

ERD for storing ICD9 data
We see that diagnoses can have sub-diagnoses, etc. I used this simple structure, and added a qualifying column of ‘record_type’ to indicate if the code listed was an actual diagnosis, or a section header. Sections of diagnoses are part of the data, and can be used in searching the database. This diagram offers a simple and quick design to handle the data given.
Comments are welcome.
December 30th, 2009 at 1:26 pm
Thanks for publishing the ERD. Nice and complete. Is this the model behind icd9data.com?
December 30th, 2009 at 1:35 pm
Hi Sunil,
I’m not associated with the icd9data.com site, though I would imagine the DB design they’ve used is similar in fashion. Given the state of the data available, it seems this is a very good structure for handling it.
-Dave
December 31st, 2009 at 9:24 am
Thanks Dave. Speaking of ICD9 data, where do you get it these days? For free that is. I would be interested in the flat files if possible – the most updated version of course
January 4th, 2010 at 11:39 am
The ICD9 data is not available as a flat-file at a free cost, from what I’ve found. When I looked at loading these tables, I took the RTF format files from the Dept. of Health and Human Services (I believe) and saved as raw text, and attempted to parse them using Perl. It wasn’t perfect, but was a good start.