# sultanum2019doccurate

# Doccurate: A Curation-Based Approach for Clinical Text Visualization


Sultanum et al. present Doccurate, a novel system embodying a curation-based approach for the visualization of large clinical text datasets [1].

Doccurate provides automation in data processing and customization for visualizations while preserving the link to the original data.

The authors believe that most existing Electronic Healthcare Record (EHR) visualization systems are not designed for clinical use, this prevents clinicians from adapting and integrating those systems into their current workflows.


According to Sultanum et al., preserving the original text in clinical notes is crucial for the visualizations of EHRs. This was their conclusion after interviewing 6 domain experts. Therefore, while Natural Language Processing (NLP) is being used to automate the process of extracting structure from large collections, Doccurate preserves the original text (as shown in Figure 1 C.2) and provides the ability to customize NLP models to re-extract data from the original text (as shown in Figure 1 D.1 and D.2). This retains the information completeness. SNOMED-CT[2] is used as the medical taxonomies database for the underlying entity recognition algorithms used for Doccurate.

In Figure 1, section B shows all symptoms extracted from the original letter and are mapped to a timeline shown in B.4. Section D shows user options to search and filter results by keywords and enable results to be saved in a collection. Stored collections are listed in section A.3. All results are highlighted in section C along with the original text.

Other user options such as adjusting bin interval and document types are available in section A.2.

Related Work

ThemeRiver[4] is a pioneer in visualizing summarized data streams for evolving prevalence data of patients, it however lacks the ability to map relationships between multiple entities within a single EHR document.

VarifocalReader[5] provides the ability to visualize both a single text document and a collection of text documents, as a general approach it doesn't support the unique characteristics of EHR documents (medical taxonomies are not being highlighted and mapped).

TimeLineCurator[6] and ConceptVector[7] offer the ability to curate models and inform date characteristics but are limited in the level of automation that Doccurate offers.

Data Characteristics

  • Data source: MIMIC-III[3] a freely accessible critical care database. The dataset is available on GitHub (opens new window)
  • Size: 3 patients, 900 documents
  • Spatial dimensionality: 2D
  • Temporal dimensionality: static
  • Type: multivariate

Visualization Techniques

  • Glyph
  • Sunburst chart
  • Matrix plot

Papers Cited


Years Spanned


Application Domain

  • Electronic Health Record Visualization
  • Text Visualization
  • Medical focus: Critical/Intensive Care, Cross-Specialty


  1. Sultanum, N., Singh, D., Brudno, M., & Chevalier, F. (2019). Doccurate: A Curation-Based Approach for Clinical Text Visualization. IEEE Transactions on Visualization and Computer Graphics, 25(1), 142–151. https://doi.org/10.1109/TVCG.2018.2864905 ↩︎

  2. SNOMED International. (n.d.). SNOMED CT. Retrieved July 4, 2019, from http://www.snomed.org/ ↩︎

  3. Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L.-W. H., Feng, M., Ghassemi, M., … Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data. https://doi.org/10.1038/sdata.2016.35 ↩︎

  4. Havre, S., Hetzler, E., Whitney, P., & Nowell, L. (2002). ThemeRiver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/2945.981848 ↩︎

  5. Koch, S., John, M., Wörner, M., Müller, A., & Ertl, T. (2014). VarifocalReader - In-depth visual analysis of large text documents. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2014.2346677 ↩︎

  6. Fulda, J., Brehmel, M., & Munzner, T. (2016). TimeLineCurator: Interactive Authoring of Visual Timelines from Unstructured Text. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2015.2467531 ↩︎

  7. Park, D., Kim, S., Lee, J., Choo, J., Diakopoulos, N., & Elmqvist, N. (2018). ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.1109/TVCG.2017.2744478 ↩︎

🔄 Last Updated: 7/8/2020, 9:48:18 PM