Reputation: 119
I have the following data.table
df <- data.table(id=c(1,2,3,4),
medication=c("Abc de 3 MG", "Afg frt re 4 MG/ML","Agh","Aj yr 5 MG"))
with
id medication
1: 1 Abc de 3 MG
2: 2 Afg frt re 4 MG/ML
3: 3 Agh
4: 4 Aj yr 5 MG
I want to extract the doses from the medication, and create a column called doses
id medication doses
1: 1 Abc de 3 MG
2: 2 Afg frt re 4 MG/ML
3: 3 Agh <NA>
4: 4 Aj yr 5 MG
It should contain the number and unit. Not every medication has a number and unit which should be included as NA
.
I looked at the tidyverse
extract
function but could not find something to extract numeric
and character
values.
I am using data.table
with a large dataset. A time efficient function is great.
Upvotes: 1
Views: 498
Reputation: 1364
Though this method is not data.table, but you can take it into account
library(tidyr)
df %>%
separate(medication, into = c("medication", "doses"), sep = "(?=\\d)")
# id medication doses
# 1 1 Abc de 3 MG
# 2 2 Afg frt re 4 MG/ML
# 3 3 Agh <NA>
# 4 4 Aj yr 5 MG
Upvotes: 1
Reputation: 887251
An option with extract
from tidyr
library(tidyr)
extract(df, medication, into = c('medication', 'doses'), '(.*)\\s+(\\d+\\s+\\D+)$')
# id medication doses
#1: 1 Abc de 3 MG
#2: 2 Afg frt re 4 MG/ML
#3: 3 <NA> <NA>
#4: 4 Aj yr 5 MG
Upvotes: 0
Reputation: 33488
Insert an @
(or any other character that is not in your column already) ahead of the first number, then use that to split the column into two:
df[, c("medication", "doses") := tstrsplit(sub("([0-9])", "@\\1", medication), "@")]
df
# id medication doses
# 1: 1 Abc de 3 MG
# 2: 2 Afg frt re 4 MG/ML
# 3: 3 Agh <NA>
# 4: 4 Aj yr 5 MG
EDIT
A cleanr solution is using slightly more advanced regex (positive lookahead), just need to remember perl = TRUE
:
df[, c("medication", "doses") := tstrsplit(medication, ".(?=[0-9])", perl = TRUE)]
Upvotes: 2
Reputation: 101818
Maybe you can try strsplit
like below
df[-1] <- do.call(rbind,lapply(strsplit(df$medication,"(?<=[A-Za-z])\\s(?=[0-9])",perl = TRUE),`length<-`,2))
which gives
> df
id medication.1 medication.2
1 1 Abc de 3 MG
2 2 Afg frt re 4 MG/ML
3 3 Agh <NA>
4 4 Aj yr 5 MG
Upvotes: 0