Reputation: 781
I have data frame, I want to create some new variables and updated the old ones, but sometimes length of variables are so many, I don't know how to put in a loop or using mapply or lapply.
df <- data.frame(x=c("A","A","A,S"),
y=c("12","12,4","10"),
z=c("String,Text","Avoid","Use"))
> df
x y z
1 A 12 String,Text
2 A 12,4 Avoid
3 A,S 10 Use
I create some new variables:
df$x_sub <- substring(sub("^[^,]*", "",df$x),2)
df$x <- sub("\\,.*", "",df$x)
df$y_sub <- substring(sub("^[^,]*", "",df$y),2)
df$y <- sub("\\,.*", "",df$y)
df$z_sub <- substring(sub("^[^,]*", "",df$z),2)
df$z <- sub("\\,.*", "",df$z)
The output is correct, but if I have 10 variables, what I need to do to save my time
x y z x_sub y_sub z_sub
1 A 12 String Text
2 A 12 Avoid 4
3 A 10 Use S
Upvotes: 1
Views: 223
Reputation: 886948
We can do this using str_extract
library(stringr)
df1 <- df
df1[] <- lapply(df, function(x) type.convert(str_extract(x, "^[^,]+"), as.is = TRUE))
df1[paste0(names(df1), "_sub")] <- lapply(df, function(x)
type.convert(str_extract(x, "(?<=,)[^,]+"), as.is = TRUE))
df1
# x y z x_sub y_sub z_sub
#1 A 12 String <NA> NA Text
#2 A 12 Avoid <NA> 4 <NA>
#3 A 10 Use S NA <NA>
Or another option is cSplit
library(splitstackshape)
cSplit(df, names(df), ",")
# x_1 x_2 y_1 y_2 z_1 z_2
#1: A NA 12 NA String Text
#2: A NA 12 4 Avoid NA
#3: A S 10 NA Use NA
Upvotes: 1