I've been fortunate enough to work with clinical data this year. It's been a new experience with new challenges.
I was a bit scared at first, since clinical data is always thought of as the holy grail of Pharma data. I was expecting something very complicated, but in reality it's quite straight forward. There were two main challenges. The first was getting around a preconception that as this is regulated data you can't do anything with it. To me that meant you can't touch the original data but it shouldn't stop you duplicating the data and using it elsewhere to feed back into discover research or to use it to look at other questions that the trial may not have focussed on. But all that's another story. Secondly; after winning the first battle, was actually getting someone to agree to give you some data. Even within an organisation this proved tricky, for reasons that lead to yet another story, but that's for another time.
With those resolved I started to look at the data. The data was in a format aligned with the SDTM standard published by CDISC. The first thing that struck me was actually the simplicity of the data. I was expecting something a lot scarier, but it was scary how badly put together the data structures were. The data structures seemed to be aligned with the physical CRF forms used to submit data which made the data a bit powerless rather than powerful. However, with linked data we can change all of that.
So all we need now is the SDTM ontology to align and format the data against. Asking around I found some partial bits and pieces but nothing substantial to use with all the data I had, so there was no choice other than to write my own. Having studied the SDTM standard it didn't take long, but there were several choices to make along the way. The version I came up with isn't the way I would have designed it from a true ontology approach. There was an existing standard and I didn't want to stray too far away from it so that the concepts would remain familiar to people. Yet I also had to enable the data.
When looking at the ontology, those familiar with clinical data and SDTM, will see the usual concepts such as demographics, subject visits, adverse events and medical history along with all the others. What I did do was extract some of the major linking elements, which were not in fact separately described, in the standard. So you will see concepts such as "unique subject", study and a few others. I've also added the properties with names matching the standard. In some cases I've also added some constraints, but not all. Concepts are of course linked with object properties and I've used the "unique subject" class as the center of attention linking to just about everything. That's on purpose as in most cases you want to correlate the subject with all the other data.
I've used the ontology in several pieces of software and it's worked pretty well. Asking questions of the data has been pretty straight forward and produced some pretty good demos.
I'm not saying the ontology is brilliant. I'm sure you'll find mistakes, omissions and areas where I've not gone into enough detail. However I'd like to do my bit for the community and put the ontology out there for people to use. Feel free to alter it for your needs and please give feedback on your experiences with it.
You can get it here. Have fun.
p.s. I've used the namespace "http://cdisc.org/CDISC/Ontology/SDTM#" which is not a valid one.