HL7-to-linelist-Gonorrhoea
HL7-to-linelist-Gonorrhoea.Rmd
This vignette provides an example HL7 to linelist example.
library(HL7.R)
#> HL7.R - for help see vignette('Getting-started', package = 'HL7.R') or vignette('package = 'HL7.R') for other examples
The task
Antibiotic resistance results needed extraction from HL7 messages. There were multiple results within each message. The goal was to create a table with 1 row per result along with personal and sample identifiers.
The data
There are 2 Gonorrhoea HL7 files containing antibiotic resistance results within this package.
gonorrhoea_files <-
system.file('extdata', package = 'HL7.R') %>%
list.files(pattern = 'gonorrhoea', full.names = TRUE)
gonorrhoea_files
#> [1] "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-1.hl7" "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-2.hl7" "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-3_2_bugs.hl7" "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-3.hl7"
#> [5] "/home/runner/work/_temp/Library/HL7.R/extdata/gonorrhoea-4_2_bugs.hl7"
Data-wrangling
This section explains several steps that were later combined into a single wrangle function. Thus, we begin with a single parsed HL7.
msg <- gonorrhoea_parsed_list[[4]]
Extraction function
We expect the possibility of completely missing fields from parsing.
For sake of example, let us pretend there could be a 5th element in
PatientName
. Selection of this element when missing
produces an error
msg$PID$PatientName[[5]]
#> Error in msg$PID$PatientName[[5]]: subscript out of bounds
What we want is for it to return a ‘blank’ value. The function
purrr::pluck
allows for a default value when a list element
is not found
purrr::pluck(msg$PID$PatientName, 5)
#> NULL
purrr::pluck(msg$PID$PatientName, 5, .default = "")
#> [1] ""
or we can roll our own
my_pluck <- function(x, .default = ''){
tryCatch(x, error = function(e) '')
}
my_pluck(msg$PID$PatientName[[5]])
#> [1] ""
# and correct otherwise
my_pluck(msg$PID$PatientName[[2]])
#> [1] "HIJLK"
regardless, to save on typing, we create an alias function.
# p = purrr::pluck with .default = ''
p <- function(x, ...){
purrr::pluck(x, ..., .default = '')
}
Personal identifiers
Personal identifiers occur only once in the message. We can safely
extract them into a pid
data.frame object. Note the use of
with()
to further save on typing.
pid <-
with(msg$PID, {
data.frame(
first_name = p(PatientName, 2),
middle_name = p(PatientName, 3),
last_name = p(PatientName, 1),
street_address = p(PatientAddress, 1),
suburb = p(PatientAddress, 3),
state = p(PatientAddress, 4),
postcode = p(PatientAddress, 5),
dob = DateTimeOfBirth,
gender = Sex)
})
pid
#> first_name middle_name last_name street_address suburb state postcode dob gender
#> 1 HIJLK ABCDE 2010 199011190000 M
Sample number and date
Simple extraction from the observation request segment
OBR
.
obr <-
with(msg$OBR, {
data.frame(specimen_number = p(FillerOrderNumber, 1),
specimen_date = p(RequestedDatetime, 1)
)
})
obr
#> specimen_number specimen_date
#> 1 22R474259Q 20220722+1000
Results data
The part is complex. The data is held among multiple OBX
segments. We need to go through them and extract certain parts, as well
as conditionally output values depending on what we find.
obxs <- grep('OBX', names(msg))
obxs # indexes of OBX in message
#> [1] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Sample site
Extract the sample site from the following OBX segment:
OBX|1|ST|M6706^Site^NATA1234|1|Urethra||||||F||0|202203171808+1000|JOHNWICK
We expect the segment to have a
x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'
.
If this is found extract p(x$ObservationValue, 1)
(with
x
being the selected segment) otherwise output
""
. Note I’ve used element names, but positions also work:
x[[2]] == 'ST' & x[[3]][[2]] == 'Site'
, and
x[[5]][[1]]
.
obxs_st_site <-
sapply(msg[obxs], function(x){
x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'
})
if (any(obxs_st_site)) {
msg_site_ind <- obxs[obxs_st_site]
obx_site <- data.frame(site = p(msg[[msg_site_ind]]$ObservationValue, 1))
} else {
obx_site <- data.frame(site = NA_character_)
}
obx_site
#> site
#> 1 Urethral swab
Antibiotic resistance results
Again, we look for results and act upon them if found, from
OBX|7|CE|18906-8^Ciprofloxacin Susc Islt^LN^M20007^Ciprofloxacin^NATA1234|7|S^Susceptible^L|...
OBX|8|CE|18895-3^Ceftriaxone Susc Islt^LN^M20062^Ceftriaxone^NATA1234|8|S^Susceptible^L|...
The step returns NULL
for segments without any hits, so
we remove them.
antibiotics_OR_grep <- "Penicillin|Ciprofloxacin|Ceftriaxone|Azithromycin"
obx_tmp <-
lapply(msg[obxs], function(x){
# TODO could examine code in position 1
# TODO could match values rather than string manipulation
obx_id <- p(x$ObservationIdentifier, 2)
if (grepl(pattern = antibiotics_OR_grep, x = obx_id, ignore.case = TRUE) ){
if (grepl(' Susc ', obx_id)) {
data.frame(antibiotic = sub(' .*', '', obx_id),
type = 'susceptibility',
value = p(x$ObservationValue, 1))
} else if (grepl('MIC', obx_id)) {
data.frame(antibiotic = sub(' .*', '', obx_id),
type = 'mic',
value = p(x$ObservationValue, 1))
}
}
})
is.not.null <- function(x) { !is.null(x) }
obx_not_nulls <- sapply(obx_tmp, is.not.null)
obx_tmp[obx_not_nulls]
#> $OBX.9
#> antibiotic type value
#> 1 Penicillin susceptibility R
#>
#> $OBX.10
#> antibiotic type value
#> 1 Ciprofloxacin susceptibility R
#>
#> $OBX.11
#> antibiotic type value
#> 1 Ceftriaxone susceptibility S
#>
#> $OBX.12
#> antibiotic type value
#> 1 Azithromycin susceptibility S
#>
#> $OBX.15
#> antibiotic type value
#> 1 Penicillin susceptibility R
And further wrangle the results - row bind, complete results for 2 test types, and do some conversions. Some tests had categorical outputs (susceptibility categories), while others we numerical (MIC) and needed conversion.
categorise_etest_mic <- function(x){
x <- sub('<', '', x)
x <- as.numeric(x)
dplyr::case_when(
x < 0.002 ~ 'Susceptible',
x < 0.05 ~ 'Less Susceptible',
TRUE ~ 'Resistant'
)
}
if (any(obx_not_nulls)){
obx_result <-
obx_tmp[obx_not_nulls] %>%
do.call(dplyr::bind_rows, .) %>%
# TODO - capture multiple strain information (was done, likely not PR'd before leaving)
dplyr::distinct(antibiotic, type, value) %>%
tidyr::complete(type = c('susceptibility', 'mic')) %>%
tidyr::pivot_wider(names_from = 'type', values_from = 'value') %>%
dplyr::mutate(susceptibility = ifelse(test = is.na(susceptibility) & !is.na(mic),
yes = categorise_etest_mic(mic),
no = susceptibility))
} else {
obx_result <- data.frame(antibiotic = NA, susceptibility = NA, mic = NA)
}
obx_result
#> # A tibble: 5 × 3
#> antibiotic mic susceptibility
#> <chr> <chr> <chr>
#> 1 NA NA NA
#> 2 Penicillin NA R
#> 3 Ciprofloxacin NA R
#> 4 Ceftriaxone NA S
#> 5 Azithromycin NA S
Completing the linelist
msh <- data.frame(lab = p(msg$MSH$SendingFacility, 1))
cbind.data.frame(file = attr(msg, 'filename'), pid, obr, obx_site, obx_result, msh)
#> file first_name middle_name last_name street_address suburb state postcode dob gender specimen_number specimen_date site antibiotic mic susceptibility lab
#> 1 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab <NA> <NA> <NA> BEOW
#> 2 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Penicillin <NA> R BEOW
#> 3 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA> R BEOW
#> 4 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Ceftriaxone <NA> S BEOW
#> 5 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Azithromycin <NA> S BEOW
As a complete function
Being happy with the code above, they were combined into a single function.
extract_pid_and_antibiotic_data <- function(msg, antibiotics = c('Penicillin', 'Ciprofloxacin', 'Ceftriaxone', 'Azithromycin')){
antibiotics_OR_grep <- paste0(antibiotics, collapse = "|")
p <- function(x,...){ purrr::pluck(x,...,.default = '') }
pid <-
with(msg$PID, {
data.frame(
first_name = p(PatientName, 2),
middle_name = p(PatientName, 3),
last_name = p(PatientName, 1),
street_address = p(PatientAddress, 1),
suburb = p(PatientAddress, 3),
state = p(PatientAddress, 4),
postcode = p(PatientAddress, 5),
dob = DateTimeOfBirth,
gender = Sex)
})
obr <-
with(msg$OBR, {
data.frame(specimen_number = p(FillerOrderNumber, 1),
specimen_date = p(RequestedDatetime, 1)
)
})
obxs <- grep('OBX', names(msg))
obxs_st_site <-
sapply(msg[obxs], function(x){
x$ValueType == 'ST' & x$ObservationIdentifier[[2]] == 'Site'
})
if (any(obxs_st_site)) {
msg_site_ind <- obxs[obxs_st_site]
obx_site <- data.frame(site = p(msg[[msg_site_ind]]$ObservationValue, 1))
} else {
obx_site <- data.frame(site = NA_character_)
}
# malformed HL7?
if (is.na(obx_site$site)){
obxs_st_site <-
sapply(msg[obxs], function(x){
x$ObservationIdentifier[[2]] == 'Site'
})
if (any(obxs_st_site)) {
msg_site_ind <- obxs[obxs_st_site]
warning('This message did not have ValueType = ST but the site ',
p(msg[[msg_site_ind]]$ObservationValue, 1), ' was found.')
}
}
obx_tmp <-
lapply(msg[obxs], function(x){
# TODO could examine code in position 1
# TODO could match values rather than string manipulation
obx_id <- p(x$ObservationIdentifier, 2)
if (grepl(pattern = antibiotics_OR_grep, x = obx_id, ignore.case = TRUE) ){
if (grepl(' Susc ', obx_id)) {
data.frame(antibiotic = sub(' .*', '', obx_id),
type = 'susceptibility',
value = p(x$ObservationValue, 1))
} else if (grepl('MIC', obx_id)) {
data.frame(antibiotic = sub(' .*', '', obx_id),
type = 'mic',
value = p(x$ObservationValue, 1))
}
}
})
is.not.null <- function(x) { !is.null(x) }
obx_not_nulls <- sapply(obx_tmp, is.not.null)
if (any(obx_not_nulls)){
obx_result <-
obx_tmp[obx_not_nulls] %>%
do.call(dplyr::bind_rows, .) %>%
# TODO - capture multiple strain information (was done, likely not PR'd before leaving)
dplyr::distinct(antibiotic, type, value) %>%
tidyr::complete(type = c('susceptibility', 'mic')) %>%
tidyr::pivot_wider(names_from = 'type', values_from = 'value') %>%
dplyr::mutate(susceptibility = ifelse(test = is.na(susceptibility) & !is.na(mic),
yes = categorise_etest_mic(mic),
no = susceptibility))
} else {
obx_result <- data.frame(antibiotic = NA, susceptibility = NA, mic = NA)
}
msh <- data.frame(lab = p(msg$MSH$SendingFacility, 1))
#msh = data.frame(lab = 'LAB_33', test = '698-1')
cbind.data.frame(file = attr(msg, 'filename'), pid, obr, obx_site, obx_result, msh)
}
Running the function
gonorrhoea_parsed_list %>%
lapply(extract_pid_and_antibiotic_data) %>%
do.call(dplyr::bind_rows, .)
#> file first_name middle_name last_name street_address suburb state postcode dob gender specimen_number specimen_date site antibiotic mic susceptibility lab
#> 1 gonorrhoea-1.hl7 Smith John 123 FAKE STREET LIVERPOOL NSW 2000 199201010000 M 21R156637Q 20220317+1000 Urethra Penicillin 0.25 Resistant SLEAS
#> 2 gonorrhoea-1.hl7 Smith John 123 FAKE STREET LIVERPOOL NSW 2000 199201010000 M 21R156637Q 20220317+1000 Urethra Ciprofloxacin <NA> S SLEAS
#> 3 gonorrhoea-1.hl7 Smith John 123 FAKE STREET LIVERPOOL NSW 2000 199201010000 M 21R156637Q 20220317+1000 Urethra Ceftriaxone <NA> S SLEAS
#> 4 gonorrhoea-1.hl7 Smith John 123 FAKE STREET LIVERPOOL NSW 2000 199201010000 M 21R156637Q 20220317+1000 Urethra Azithromycin <NA> S SLEAS
#> 5 gonorrhoea-2.hl7 Flinstone Fred 302 COBBLESTONE WAY BEDROCK NSW 2000 198606010000 M 22R2347635O 20220315+1000 Penile Penicillin 0.25 Resistant SLEAS
#> 6 gonorrhoea-2.hl7 Flinstone Fred 302 COBBLESTONE WAY BEDROCK NSW 2000 198606010000 M 22R2347635O 20220315+1000 Penile Ciprofloxacin <NA> S SLEAS
#> 7 gonorrhoea-2.hl7 Flinstone Fred 302 COBBLESTONE WAY BEDROCK NSW 2000 198606010000 M 22R2347635O 20220315+1000 Penile Ceftriaxone <NA> S SLEAS
#> 8 gonorrhoea-2.hl7 Flinstone Fred 302 COBBLESTONE WAY BEDROCK NSW 2000 198606010000 M 22R2347635O 20220315+1000 Penile Azithromycin <NA> S SLEAS
#> 9 gonorrhoea-3_2_bugs.hl7 HIJLK ABCDE 2010 199011190000 M 11Z474259Q 20220722+1000 Urethral swab <NA> <NA> <NA> MEOW
#> 10 gonorrhoea-3_2_bugs.hl7 HIJLK ABCDE 2010 199011190000 M 11Z474259Q 20220722+1000 Urethral swab Penicillin <NA> R MEOW
#> 11 gonorrhoea-3_2_bugs.hl7 HIJLK ABCDE 2010 199011190000 M 11Z474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA> R MEOW
#> 12 gonorrhoea-3_2_bugs.hl7 HIJLK ABCDE 2010 199011190000 M 11Z474259Q 20220722+1000 Urethral swab Ceftriaxone <NA> S MEOW
#> 13 gonorrhoea-3_2_bugs.hl7 HIJLK ABCDE 2010 199011190000 M 11Z474259Q 20220722+1000 Urethral swab Azithromycin <NA> S MEOW
#> 14 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab <NA> <NA> <NA> BEOW
#> 15 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Penicillin <NA> R BEOW
#> 16 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Ciprofloxacin <NA> R BEOW
#> 17 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Ceftriaxone <NA> S BEOW
#> 18 gonorrhoea-3.hl7 HIJLK ABCDE 2010 199011190000 M 22R474259Q 20220722+1000 Urethral swab Azithromycin <NA> S BEOW
#> 19 gonorrhoea-4_2_bugs.hl7 John Blippi 789 Ten Street Burry Hills NSW 2010 198809030000 M 22R197544P 20220328+1000 Rectum Penicillin 1 S MEOW
#> 20 gonorrhoea-4_2_bugs.hl7 John Blippi 789 Ten Street Burry Hills NSW 2010 198809030000 M 22R197544P 20220328+1000 Rectum Ciprofloxacin <NA> R MEOW
#> 21 gonorrhoea-4_2_bugs.hl7 John Blippi 789 Ten Street Burry Hills NSW 2010 198809030000 M 22R197544P 20220328+1000 Rectum Ceftriaxone <NA> S MEOW
#> 22 gonorrhoea-4_2_bugs.hl7 John Blippi 789 Ten Street Burry Hills NSW 2010 198809030000 M 22R197544P 20220328+1000 Rectum Azithromycin <NA> S MEOW