Taming the Emergency Department Chief Complaint:
Transforming Unstructured Data Elements for Secondary Uses
Stephanie W. Haas
School of Information and Library Science
UNC-CH
The Challenge
How to use text created for one purpose for new applications
reuse, repurposing, recycling, refactoring
different context of use
different audience with different knowledge, expectations
different requirements (e.g., precision, standardization, expressiveness)
What if the original text wasn't that easy to use in the first place?
Cast of Characters
Debbie Travers and Anna Waller, Dept of Emergency Medicine, UNC-CH
Emergency Medicine Text Processor (EMT-P)
North Carolina Emergency Department Database (NCEDD) and staff
North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC Detect) and staff
Emergency Department Patient Records
Primary uses
gather and record initial information (symptoms, situation, registration)
guide initial patient care
record additional findings, treatment, diagnoses
Secondary uses
clinical and other research
quality control
administration
syndromic surveillance (early warning)
Characteristics of Records
Data Elements for Emergency Department Systems (DEEDS), (recommended in 1997)
Some elements are standardized
final diagnosis uses ICD-9 codes
temperature usually in oC
Others are not
chief complaint (CC), triage notes, physician and nurse notes
Survey of EDs
NC & Seattle/King County WA
Travers, Waller, Haas, Lober & Beard, (2003)
Is ED data ready for use in bioterrorism surveillance systems?
availability, timeliness, sources, standards
NC 1999, n=96
NC 2003, n=80
Seattle/King County 2003, n=17
Available in Electronic Form?
Other results
Availability of diagnosis: 24 hrs 1 month
Only 2 respondents knew more than a little or nothing about DEEDS
Wide variety of information systems in use
local specialization is common
different systems in hospital and ED
rapid change in IT
Not easy to aggregate or merge data from different EDs
Chief Complaint (CC)
Recorded at triage (2-5 minute encounter)
Reason patient came to the ED
In NC 2003 survey
51% electronic, more since then
1-4 fields, 12 characters "paragraphs"
free text, locally- and/or vendor-developed pick list, ICD-9-CM
patient's exact words, paraphrase, codes
Characteristics of CC Pick Lists
Travers, Haas, Waller, Reeder, & Spicer, (2005)
Vary in size, specificity, restrictions on use
Size: 68 terms 1,680 terms
Specificity: pain vs. abd pain or neck pain
Restrictions on use:
no additions, triage notes used for additional detail
free text additions or substitutions for CC
Unknown sources of lists
Characteristics of CC Free Text
(Travers, Travers & Haas)
Telegraphic, semantically dense
cough/diarrhea/congestion
cough/abdominal pain/inhaled toxic fumes
abd pain- loss of appetite
gun shot wound knee
Multiple ways of expressing a concept
chst pn, cp, chert pn, chest/arm pain, chest pain
Ambiguous expressions
Rx
Variable punctuation, spacing, capitalization
h/a/rt shoulder & left hip pain x2d
tvfellonhim
stomachulcer
Coordinate structures and other use of slash
hip/thigh pain
tingling feet/hands
nausea neck/arm pain/see spots
Abbreviations, initialisms, truncations, misspellings
ha, h/a, headache
ETOH withdrawal
pian
should, shoulde, shldr, shoulder
n/v/d
Indecipherable
tacky
RBLD
5320407
other
feels weird!
fuzzy shin
Goals
Standardization (of entry terms or post-entry mapping routines)
Translate from immediate clinical use for individual patient to aggregated data for secondary uses
Extract CC concepts
clean, normalize and map to UMLS
Interpret modifiers, qualifiers, negation, temporal relationships, other relationships (e.g., co-morbidity)
EMT-P (Travers and others)
Clean, normalize, and map to UMLS concepts
Identify concepts needed to express CC
Prepare CCs for aggregation
Improve data quality of CCs for secondary uses
Identify ED concepts missing from or inappropriately defined in UMLS
jail clearance, MVC
Round 1
expand acronyms, abbreviations, truncations, correct common misspellings
Round 2
handle a variety of punctuation, expand coordinate structures, segment into terms
Round 3
delete unnecessary modifiers, qualifiers, temporal clauses
Provide more accurate picture of reasons people come to ED
Validity
Not just matching a UMLS concept, but matching correctly
Sample of matches validated by clinicians
equivalent: a-fib, atrial fibrillation
related: cramps, cramps(muscle)
no match: arrest, law enforcement arrest
No agreement: vag bleed, vag bleeding, bleeding of vagina
Influence of context in interpretation (i.e., w/o patient) concern for secondary uses?
Next The Triage Note
Additional information to supplement CC
Shares many characteristics of CC
PT AMB TO TRIAGE AT 1440 C/ORASH ALL OVER BODY ONSET THURSDAY AM, RED RAISED SPOTS TO NECK, PT REPORTS ITCHI G NO PAIN, ALSO HA X 1 WEEK DENIES INSECT BITES OTHER C/O OR FEVER
Length varies from phrases to "sentences"
Form varies from rough notes to auto-complete templates producing well-formed text
More negation, temporal relationships, other discourse-level relationships
Essentially no limit on topic, therefore structure or vocabulary
contrast to lab reports (Friedman)
Immediate goals
Extract concepts as with CC
Correct interpretation of negation
denies fever
no sob, fvr
Chapman et al., 2001
Extract temporal relationships
Extract additional relationships
Concepts of Time
past, present, future
span, point, duration, frequency
relative time (yesterday), absolute time (2/28/06, 3:30 p.m.)
vagueness and uncertainty
events
sequential, simultaneous, overlapping
cause, result, unrelated
report (when someone told someone else)
Now
Thursday AM
1 Week Ago
headache
rash onset
triage
Syndromic Surveillance
Close to real-time identification of potential outbreak or occurrence of health problems of interest
gastrointestinal severe, fever/rash illness
Follow-up by epidemiologists, clinicians, public health officials
Limited by
availability and readiness of data
sensitivity and specificity of classification of data into syndrome definitions
North Carolina Emergency Department Database (NCEDD)
Collects ED data daily from 63% of NC EDs
Collects up to 22 data elements, including CC and triage note
Aggregated data (graphs and reports) available for use by hospitals, researchers, public health officials
Uses EMT-P to pre-process CC
North Carolina Disease Event Tracking and Epidemiologic Collection Tool
(NC DETECT)
Collects data daily from NCEDD, Carolinas Poison Center, NC Pre-hospital Medical Information System (PreMIS), NC State College of Veterinary Medicine Laboratories.
Coming soon: Piedmont Wildlife Center
Classifies data into syndromes
Creates daily reports for public health officials and other approved users
Syndromic definitions:
Map CDC definitions into DB queries
ED CC pre-processed with EMT-P
Other data in various forms DB queries can be exceedingly messy!
Sensitivity and specificity are of concern
Merging Data from Multiple Sources
CC and triage note (collected at same time and place, serve same purpose)
recognize agreement, paraphrase
recognize and fill gaps for complete description of symptoms, situation
recognize disagreement -- which element should be believed?
Then add different types of data elements from various sources
Remaining Issues
Controlled vocabulary for CC?
CC symposium
Extracting additional information
time, situation (e.g., circumstances of injury), co-morbidity, history, other relationships
Merging information from disparate sources
Reconciling primary and secondary stakeholders' needs
Stephanie W. Haas
stephani@ils.unc.edu
EMT-P
http://www.med.unc.edu/wrkunits/2depts/emergmed/EMTP/index.html
NCEDD
http://www.ncedd.org/index.html
NC DETECT
http://www.ncdetect.org/