HL7-to-linelist-Gonorrhoea • HL7.R

This vignette provides an example HL7 to linelist example.

library(HL7.R)
#> HL7.R - for help see vignette('Getting-started', package = 'HL7.R') or vignette('package = 'HL7.R') for other examples

The task

Antibiotic resistance results needed extraction from HL7 messages. There were multiple results within each message. The goal was to create a table with 1 row per result along with personal and sample identifiers.

The data

There are 2 Gonorrhoea HL7 files containing antibiotic resistance results within this package.

gonorrhoea_files <-
  system.file('extdata', package = 'HL7.R') %>% 
  list.files(pattern = 'gonorrhoea', full.names = TRUE)

gonorrhoea_files
#> [1] "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-1.hl7"        "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-2.hl7"        "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-3_2_bugs.hl7" "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-3.hl7"       
#> [5] "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-4_2_bugs.hl7"

Parsing

The messages were parsed using HL7.R.

gonorrhoea_parsed_list <-
  lapply(gonorrhoea_files, parse_hl7_message)

# 2 elements (files), with 15 elements (segments) each
length(gonorrhoea_parsed_list)
#> [1] 5
sapply(gonorrhoea_parsed_list, length)
#> [1] 15 15 24 24 22

Data-wrangling

This section explains several steps that were later combined into a single wrangle function. Thus, we begin with a single parsed HL7.

msg <- gonorrhoea_parsed_list[[4]]

Extraction function

We expect the possibility of completely missing fields from parsing. For sake of example, let us pretend there could be a 5th element in PatientName. Selection of this element when missing produces an error

msg$PID$PatientName[[5]]
#> Error in msg$PID$PatientName[[5]]: subscript out of bounds

What we want is for it to return a ‘blank’ value. The function purrr::pluck allows for a default value when a list element is not found

purrr::pluck(msg$PID$PatientName, 5)
#> NULL
purrr::pluck(msg$PID$PatientName, 5, .default = "")
#> [1] ""

or we can roll our own

my_pluck <- function(x, .default = ''){
  tryCatch(x, error = function(e) '')
}

my_pluck(msg$PID$PatientName[[5]])
#> [1] ""
# and correct otherwise
my_pluck(msg$PID$PatientName[[2]])
#> [1] "HIJLK"

regardless, to save on typing, we create an alias function.

# p = purrr::pluck with .default = ''
p <- function(x, ...){ 
  purrr::pluck(x, ..., .default = '') 
}

Personal identifiers

Personal identifiers occur only once in the message. We can safely extract them into a pid data.frame object. Note the use of with() to further save on typing.

pid <-
  with(msg$PID, {
    data.frame(
      first_name    = p(PatientName, 2),
      middle_name = p(PatientName, 3),
      last_name = p(PatientName, 1),
      street_address = p(PatientAddress, 1),
      suburb = p(PatientAddress, 3),
      state = p(PatientAddress, 4),
      postcode = p(PatientAddress, 5),
      dob = DateTimeOfBirth,
      gender = Sex)
  })

pid
#>   first_name middle_name last_name street_address suburb state postcode          dob gender
#> 1      HIJLK                 ABCDE                                 2010 199011190000      M

Sample number and date

Simple extraction from the observation request segment OBR.

obr <-
  with(msg$OBR, {
    data.frame(specimen_number = p(FillerOrderNumber, 1),
               specimen_date = p(RequestedDatetime, 1)
    )
  })

obr
#>   specimen_number specimen_date
#> 1      22R474259Q 20220722+1000

Results data

The part is complex. The data is held among multiple OBX segments. We need to go through them and extract certain parts, as well as conditionally output values depending on what we find.

obxs <- grep('OBX', names(msg))

obxs # indexes of OBX in message
#>  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Sample site

Extract the sample site from the following OBX segment:

OBX|1|ST|M6706^Site^NATA1234|1|Urethra||||||F||0|202203171808+1000|JOHNWICK

We expect the segment to have a x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'. If this is found extract p(x$ObservationValue, 1) (with x being the selected segment) otherwise output "". Note I’ve used element names, but positions also work: x[[2]] == 'ST' & x[[3]][[2]] == 'Site', and x[[5]][[1]].

obxs_st_site <-
  sapply(msg[obxs], function(x){
    x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'
  })

if (any(obxs_st_site)) {
  msg_site_ind <- obxs[obxs_st_site]
  obx_site <- data.frame(site = p(msg[[msg_site_ind]]$ObservationValue, 1))
} else {
  obx_site <- data.frame(site = NA_character_)
}

obx_site
#>            site
#> 1 Urethral swab

Antibiotic resistance results

Again, we look for results and act upon them if found, from

OBX|7|CE|18906-8^Ciprofloxacin Susc Islt^LN^M20007^Ciprofloxacin^NATA1234|7|S^Susceptible^L|... OBX|8|CE|18895-3^Ceftriaxone Susc Islt^LN^M20062^Ceftriaxone^NATA1234|8|S^Susceptible^L|...

The step returns NULL for segments without any hits, so we remove them.

antibiotics_OR_grep <- "Penicillin|Ciprofloxacin|Ceftriaxone|Azithromycin"

obx_tmp <-
  lapply(msg[obxs], function(x){
    
    # TODO could examine code in position 1
    # TODO could match values rather than string manipulation
    obx_id <- p(x$ObservationIdentifier, 2)
    
    if (grepl(pattern = antibiotics_OR_grep, x = obx_id, ignore.case = TRUE) ){
      
      if (grepl(' Susc ', obx_id)) {
        
        data.frame(antibiotic = sub(' .*', '', obx_id),
                   type = 'susceptibility',
                   value = p(x$ObservationValue, 1))
        
      } else if (grepl('MIC', obx_id)) {
        
        data.frame(antibiotic = sub(' .*', '', obx_id),
                   type = 'mic',
                   value = p(x$ObservationValue, 1))
      }
      
    }
  })

is.not.null <- function(x) { !is.null(x) }

obx_not_nulls <- sapply(obx_tmp, is.not.null)

obx_tmp[obx_not_nulls]
#> $OBX.9
#>   antibiotic           type value
#> 1 Penicillin susceptibility     R
#> 
#> $OBX.10
#>      antibiotic           type value
#> 1 Ciprofloxacin susceptibility     R
#> 
#> $OBX.11
#>    antibiotic           type value
#> 1 Ceftriaxone susceptibility     S
#> 
#> $OBX.12
#>     antibiotic           type value
#> 1 Azithromycin susceptibility     S
#> 
#> $OBX.15
#>   antibiotic           type value
#> 1 Penicillin susceptibility     R

And further wrangle the results - row bind, complete results for 2 test types, and do some conversions. Some tests had categorical outputs (susceptibility categories), while others we numerical (MIC) and needed conversion.

categorise_etest_mic <- function(x){
  x <- sub('<', '', x)
  x <- as.numeric(x)
  dplyr::case_when(
    x < 0.002 ~ 'Susceptible',
    x < 0.05 ~ 'Less Susceptible',
    TRUE ~ 'Resistant'
  )
}

if (any(obx_not_nulls)){
  obx_result <-
    obx_tmp[obx_not_nulls] %>%
    do.call(dplyr::bind_rows, .) %>%
    # TODO - capture multiple strain information (was done, likely not PR'd before leaving)
    dplyr::distinct(antibiotic, type, value) %>% 
    tidyr::complete(type = c('susceptibility', 'mic')) %>%
    tidyr::pivot_wider(names_from = 'type', values_from = 'value') %>%
    dplyr::mutate(susceptibility = ifelse(test = is.na(susceptibility) & !is.na(mic),
                                          yes = categorise_etest_mic(mic),
                                          no = susceptibility))
} else {
  obx_result <- data.frame(antibiotic = NA, susceptibility = NA, mic = NA)
}

obx_result
#> # A tibble: 5 × 3
#>   antibiotic    mic   susceptibility
#>   <chr>         <chr> <chr>         
#> 1 NA            NA    NA            
#> 2 Penicillin    NA    R             
#> 3 Ciprofloxacin NA    R             
#> 4 Ceftriaxone   NA    S             
#> 5 Azithromycin  NA    S

Completing the linelist

msh <- data.frame(lab = p(msg$MSH$SendingFacility, 1))

cbind.data.frame(file = attr(msg, 'filename'), pid, obr, obx_site, obx_result, msh)
#>               file first_name middle_name last_name street_address suburb state postcode          dob gender specimen_number specimen_date          site    antibiotic  mic susceptibility  lab
#> 1 gonorrhoea-3.hl7      HIJLK                 ABCDE                                 2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab          <NA> <NA>           <NA> BEOW
#> 2 gonorrhoea-3.hl7      HIJLK                 ABCDE                                 2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab    Penicillin <NA>              R BEOW
#> 3 gonorrhoea-3.hl7      HIJLK                 ABCDE                                 2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA>              R BEOW
#> 4 gonorrhoea-3.hl7      HIJLK                 ABCDE                                 2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab   Ceftriaxone <NA>              S BEOW
#> 5 gonorrhoea-3.hl7      HIJLK                 ABCDE                                 2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab  Azithromycin <NA>              S BEOW

As a complete function

Being happy with the code above, they were combined into a single function.

extract_pid_and_antibiotic_data <- function(msg, antibiotics = c('Penicillin', 'Ciprofloxacin', 'Ceftriaxone', 'Azithromycin')){
  
  antibiotics_OR_grep <- paste0(antibiotics, collapse = "|")
  
  p <- function(x,...){  purrr::pluck(x,...,.default = '') }
  
  pid <-
    with(msg$PID, {
      data.frame(
        first_name  = p(PatientName, 2),
        middle_name = p(PatientName, 3),
        last_name = p(PatientName, 1),
        street_address = p(PatientAddress, 1),
        suburb = p(PatientAddress, 3),
        state = p(PatientAddress, 4),
        postcode = p(PatientAddress, 5),
        dob = DateTimeOfBirth,
        gender = Sex)
    })
  
  obr <-
    with(msg$OBR, {
      data.frame(specimen_number = p(FillerOrderNumber, 1),
                 specimen_date = p(RequestedDatetime, 1)
      )
    })
  
  obxs <- grep('OBX', names(msg))
  
  obxs_st_site <-
    sapply(msg[obxs], function(x){
      x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'
    })
  
  if (any(obxs_st_site)) {
    msg_site_ind <- obxs[obxs_st_site]
    obx_site <- data.frame(site = p(msg[[msg_site_ind]]$ObservationValue, 1))
  } else {
    obx_site <- data.frame(site = NA_character_)
  }
  
  # malformed HL7?
  if (is.na(obx_site$site)){
    obxs_st_site <-
      sapply(msg[obxs], function(x){
        x$ObservationIdentifier[[2]] == 'Site'
      })
    if (any(obxs_st_site)) {
      msg_site_ind <- obxs[obxs_st_site]
      warning('This message did not have ValueType = ST but the site ',
              p(msg[[msg_site_ind]]$ObservationValue, 1), ' was found.')
    }
  }
  
  obx_tmp <-
    lapply(msg[obxs], function(x){
      
      # TODO could examine code in position 1
      # TODO could match values rather than string manipulation
      obx_id <- p(x$ObservationIdentifier, 2)
      
      if (grepl(pattern = antibiotics_OR_grep, x = obx_id, ignore.case = TRUE) ){
        
        if (grepl(' Susc ', obx_id)) {
          
          data.frame(antibiotic = sub(' .*', '', obx_id),
                     type = 'susceptibility',
                     value = p(x$ObservationValue, 1))
          
        } else if (grepl('MIC', obx_id)) {
          
          data.frame(antibiotic = sub(' .*', '', obx_id),
                     type = 'mic',
                     value = p(x$ObservationValue, 1))
        }
        
      }
    })
  
  
  is.not.null <- function(x) { !is.null(x) }
  
  obx_not_nulls <- sapply(obx_tmp, is.not.null)
  
  if (any(obx_not_nulls)){
    obx_result <-
      obx_tmp[obx_not_nulls] %>%
      do.call(dplyr::bind_rows, .) %>%
      # TODO - capture multiple strain information (was done, likely not PR'd before leaving)
      dplyr::distinct(antibiotic, type, value) %>% 
      tidyr::complete(type = c('susceptibility', 'mic')) %>%
      tidyr::pivot_wider(names_from = 'type', values_from = 'value') %>%
      dplyr::mutate(susceptibility = ifelse(test = is.na(susceptibility) & !is.na(mic),
                                            yes = categorise_etest_mic(mic),
                                            no = susceptibility))
  } else {
    obx_result <- data.frame(antibiotic = NA, susceptibility = NA, mic = NA)
  }
  
  msh <- data.frame(lab = p(msg$MSH$SendingFacility, 1))
  #msh = data.frame(lab = 'LAB_33', test = '698-1')
  
  cbind.data.frame(file = attr(msg, 'filename'), pid, obr, obx_site, obx_result, msh)
  
}

Running the function

gonorrhoea_parsed_list %>% 
  lapply(extract_pid_and_antibiotic_data) %>% 
  do.call(dplyr::bind_rows, .)
#>                       file first_name middle_name last_name      street_address      suburb state postcode          dob gender specimen_number specimen_date          site    antibiotic  mic susceptibility   lab
#> 1         gonorrhoea-1.hl7      Smith                  John     123 FAKE STREET   LIVERPOOL   NSW     2000 199201010000      M      21R156637Q 20220317+1000       Urethra    Penicillin 0.25      Resistant SLEAS
#> 2         gonorrhoea-1.hl7      Smith                  John     123 FAKE STREET   LIVERPOOL   NSW     2000 199201010000      M      21R156637Q 20220317+1000       Urethra Ciprofloxacin <NA>              S SLEAS
#> 3         gonorrhoea-1.hl7      Smith                  John     123 FAKE STREET   LIVERPOOL   NSW     2000 199201010000      M      21R156637Q 20220317+1000       Urethra   Ceftriaxone <NA>              S SLEAS
#> 4         gonorrhoea-1.hl7      Smith                  John     123 FAKE STREET   LIVERPOOL   NSW     2000 199201010000      M      21R156637Q 20220317+1000       Urethra  Azithromycin <NA>              S SLEAS
#> 5         gonorrhoea-2.hl7  Flinstone                  Fred 302 COBBLESTONE WAY     BEDROCK   NSW     2000 198606010000      M     22R2347635O 20220315+1000        Penile    Penicillin 0.25      Resistant SLEAS
#> 6         gonorrhoea-2.hl7  Flinstone                  Fred 302 COBBLESTONE WAY     BEDROCK   NSW     2000 198606010000      M     22R2347635O 20220315+1000        Penile Ciprofloxacin <NA>              S SLEAS
#> 7         gonorrhoea-2.hl7  Flinstone                  Fred 302 COBBLESTONE WAY     BEDROCK   NSW     2000 198606010000      M     22R2347635O 20220315+1000        Penile   Ceftriaxone <NA>              S SLEAS
#> 8         gonorrhoea-2.hl7  Flinstone                  Fred 302 COBBLESTONE WAY     BEDROCK   NSW     2000 198606010000      M     22R2347635O 20220315+1000        Penile  Azithromycin <NA>              S SLEAS
#> 9  gonorrhoea-3_2_bugs.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      11Z474259Q 20220722+1000 Urethral swab          <NA> <NA>           <NA>  MEOW
#> 10 gonorrhoea-3_2_bugs.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      11Z474259Q 20220722+1000 Urethral swab    Penicillin <NA>              R  MEOW
#> 11 gonorrhoea-3_2_bugs.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      11Z474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA>              R  MEOW
#> 12 gonorrhoea-3_2_bugs.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      11Z474259Q 20220722+1000 Urethral swab   Ceftriaxone <NA>              S  MEOW
#> 13 gonorrhoea-3_2_bugs.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      11Z474259Q 20220722+1000 Urethral swab  Azithromycin <NA>              S  MEOW
#> 14        gonorrhoea-3.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab          <NA> <NA>           <NA>  BEOW
#> 15        gonorrhoea-3.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab    Penicillin <NA>              R  BEOW
#> 16        gonorrhoea-3.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA>              R  BEOW
#> 17        gonorrhoea-3.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab   Ceftriaxone <NA>              S  BEOW
#> 18        gonorrhoea-3.hl7      HIJLK                 ABCDE                                           2010 199011190000      M      22R474259Q 20220722+1000 Urethral swab  Azithromycin <NA>              S  BEOW
#> 19 gonorrhoea-4_2_bugs.hl7       John                Blippi      789 Ten Street Burry Hills   NSW     2010 198809030000      M      22R197544P 20220328+1000        Rectum    Penicillin    1              S  MEOW
#> 20 gonorrhoea-4_2_bugs.hl7       John                Blippi      789 Ten Street Burry Hills   NSW     2010 198809030000      M      22R197544P 20220328+1000        Rectum Ciprofloxacin <NA>              R  MEOW
#> 21 gonorrhoea-4_2_bugs.hl7       John                Blippi      789 Ten Street Burry Hills   NSW     2010 198809030000      M      22R197544P 20220328+1000        Rectum   Ceftriaxone <NA>              S  MEOW
#> 22 gonorrhoea-4_2_bugs.hl7       John                Blippi      789 Ten Street Burry Hills   NSW     2010 198809030000      M      22R197544P 20220328+1000        Rectum  Azithromycin <NA>              S  MEOW