Reputation: 179
I have a column in a dataframe called parm_value that I would like to split into two columns, lower and upper bound based on the position of the underscore in the field. I have been trying to use a combination of grep and substring with no success
Current dataframe format:
parm_value
1 30_34
2 60_64
3 65_69
4 75_79
5 90_94
Desired data frame format:
parm_value lower_bound upper_bound
1 30_34 30 34
2 60_64 60 64
3 65_69 65 69
4 75_79 75 79
5 90_94 90 94
I have been trying things like
dat02 <-
dat01 %>%
mutate(lower_bound = substring(parm_value, 1, grep("_", parm_value) - 1)
Upvotes: 2
Views: 945
Reputation: 887028
Try read.table
cbind(df1[1],read.table(text= as.character(df1$parm_value), sep="_",
col.names=c('lower_bound', 'upper_bound')))
# parm_value lower_bound upper_bound
#1 30_34 30 34
#2 60_64 60 64
#3 65_69 65 69
#4 75_79 75 79
#5 90_94 90 94
Or separate
from tidyr
library(tidyr)
separate(df1, parm_value, into=c('lower_bound', 'upper_bound'), remove=FALSE)
# parm_value lower_bound upper_bound
#1 30_34 30 34
#2 60_64 60 64
#3 65_69 65 69
#4 75_79 75 79
#5 90_94 90 94
Upvotes: 1
Reputation: 4472
you could also use cSplit
from splitstackshape
library(splitstackshape)
out = cbind(dat, setnames(cSplit(dat, "parm_value", "_", fixed = FALSE),
c("lower_bound", "upper_bound")))
#> out
# parm_value lower_bound upper_bound
#1 30_34 30 34
#2 60_64 60 64
#3 65_69 65 69
#4 75_79 75 79
#5 90_94 90 94
Upvotes: 1
Reputation: 7784
Use strsplit
:
library(data.table)
xmpl <- data.table(val = rep("65_45", 5))
xmpl[ , lower := sapply(strsplit(val, "_"), "[[", 1)]
xmpl[ , upper := sapply(strsplit(val, "_"), "[[", 2)]
xmpl
# val lower upper
# 1: 65_45 65 45
# 2: 65_45 65 45
# 3: 65_45 65 45
# 4: 65_45 65 45
# 5: 65_45 65 45
If it is a really large table you can save runtime by only running the strsplit
once, then calling the object when defining the new data.table
fields.
strsplit
returns a list:
strsplit("65_45", "_")
# [[1]]
# [1] "65" "45"
The sapply
call iterates through the list with the subsetting function [[
selecting the Nth item, where N is given in sapply
as sapply(some_list, "[[", N)
.
Upvotes: 2
Reputation: 24074
You can try if your data.frame is called df
:
cbind(df, `colnames<-`( do.call("rbind", sapply(df[,1], strsplit, "_")), c("lower bound", "upper bound")))
# parm_value lower bound upper bound
# 1 30_34 30 34
# 2 60_64 60 64
# 3 65_69 65 69
# 4 75_79 75 79
# 5 90_94 90 94
Upvotes: 3