Reputation: 1644
I would like to split strings in my dataframe using stringr
.
The following is my dataframe:
df<-data.frame(ID = 1:26,
DRUG_STRENGTH = c("50 MG", "1250 MG", "20 MG", "200 MG", "2MG", "60MG", NA, "300IU",
NA, "600 MG", "500MG", "625MG", NA, NA, "50MG/ML", "40MG", "200MG",
"200MG", "200MG", "5 MG", "5 MG", "200MG", "300IU/3ML", "0.05%",
"112.5 BILLION", "10.8MG"))
My desired dataframe is:
# > df
# ID DRUG_STRENGTH DRUG_STRENGTH_NO DRUG_STRENGTH_UNIT
# 1 1 50 MG 50 MG
# 2 2 1250 MG 1250 MG
# 3 3 20 MG 20 MG
# 4 4 200 MG 200 MG
# 5 5 2MG 2 MG
# 6 6 60MG 60 MG
# 7 7 <NA> <NA> <NA>
# 8 8 300IU 300 IU
# 9 9 <NA> <NA> <NA>
# 10 10 600 MG 600 MG
# 11 11 500MG 500 MG
# 12 12 625MG 625 MG
# 13 13 <NA> <NA> <NA>
# 14 14 <NA> <NA> <NA>
# 15 15 50MG/ML 50 MG/ML
# 16 16 40MG 40 MG
# 17 17 200MG 200 MG
# 18 18 200MG 200 MG
# 19 19 200MG 200 MG
# 20 20 5 MG 5 MG
# 21 21 5 MG 5 MG
# 22 22 200MG 200 MG
# 23 23 300IU/3ML 300 IU/3ML
# 24 24 0.05% 0.05 %
# 25 25 112.5 BILLION 112.5 BILLION
# 26 26 10.8MG 10.8 MG
My code gives me my desired df but I would like to ask if there is a nicer way to write the regular expressions.
df <- df %>%
mutate(DRUG_STRENGTH_NO = str_extract(DRUG_STRENGTH, pattern = "^\\d\\.?\\d?\\.?\\d?\\.?\\d*"),
DRUG_STRENGTH_UNIT = str_trim(str_replace(DRUG_STRENGTH, pattern = "^\\d\\.?\\d?\\.?\\d?\\.?\\d*", replacement = "")))
Upvotes: 1
Views: 52
Reputation: 46
Or, if you make sure the number and the remainder are separated by say, a space, you could use strsplit or str_split (with or without simplify). Using regular expressions might prove to be more flexible, but can also turn messy in more complicated situations.
Upvotes: 0
Reputation: 193517
I'd use extract
for this:
library(tidyverse)
df %>%
extract(DRUG_STRENGTH, into = c("No", "Unit"), "([0-9.]+)(.*)", remove = FALSE)
## ID DRUG_STRENGTH No Unit
## 1 1 50 MG 50 MG
## 2 2 1250 MG 1250 MG
## 3 3 20 MG 20 MG
## 4 4 200 MG 200 MG
## 5 5 2MG 2 MG
## 6 6 60MG 60 MG
## 7 7 <NA> <NA> <NA>
## 8 8 300IU 300 IU
## 9 9 <NA> <NA> <NA>
## 10 10 600 MG 600 MG
## 11 11 500MG 500 MG
## 12 12 625MG 625 MG
## 13 13 <NA> <NA> <NA>
## 14 14 <NA> <NA> <NA>
## 15 15 50MG/ML 50 MG/ML
## 16 16 40MG 40 MG
## 17 17 200MG 200 MG
## 18 18 200MG 200 MG
## 19 19 200MG 200 MG
## 20 20 5 MG 5 MG
## 21 21 5 MG 5 MG
## 22 22 200MG 200 MG
## 23 23 300IU/3ML 300 IU/3ML
## 24 24 0.05% 0.05 %
## 25 25 112.5 BILLION 112.5 BILLION
## 26 26 10.8MG 10.8 MG
You may need to go back through and check for any whitespace later.
Upvotes: 2