Reputation: 2157
I have the following list of data frames:
df1 <- data.frame(x = 1:3, y=c("1,2","1,2,3","1,5"))
df2 <- data.frame(x = 4:6, y=c("1,2","1,4","1,6,7,8"))
filelist <- list(df1,df2)
> filelist
[[1]]
x y
1 1 1,2
2 2 1,2,3
3 3 1,5
[[2]]
x y
1 4 1,2
2 5 1,4
3 6 1,6,7,8
Now I want to split each column 'y' by character ',' and store the output in new columns in the dataframe.
The output should look like this:
> filelist
[[1]]
x y_ref y_alt1 y_alt2
1 1 1 2
2 2 1 2 3
3 3 1 5
[[2]]
x y_ref y_alt2 y_alt3 y_alt4
1 4 1 2
2 5 1 4
3 6 1 6 7 8
How should I do this? I know there is 'strsplit' to split a string by character. But I don't see how I can store the output then in different columns.
Upvotes: 1
Views: 107
Reputation: 1177
Also with dplyr
you can use separate()
like this:
df %>%
separate(y, into = c(y_ind,y_alt1,...), sep = ",")
Note that into can also be used more "programmatically" to generate the needed amount of resulting columns with a proper indexing without manually defining each result column.
Upvotes: 0
Reputation: 11255
Here's a solution that relies on tstrsplit
from data.table
library(data.table)
lapply(filelist,
function(DF) {
commas = max(nchar(as.character(DF$y)) -nchar( gsub(",", "", DF$y)))
DF[, c('y_ind', paste0('y_alt', seq_len(commas)))] = tstrsplit(as.character(DF$y), ',')
DF
})
#> [[1]]
#> x y y_ind y_alt1 y_alt2
#> 1 1 1,2 1 2 <NA>
#> 2 2 1,2,3 1 2 3
#> 3 3 1,5 1 5 <NA>
#>
#> [[2]]
#> x y y_ind y_alt1 y_alt2 y_alt3
#> 1 4 1,2 1 2 <NA> <NA>
#> 2 5 1,4 1 4 <NA> <NA>
#> 3 6 1,6,7,8 1 6 7 8
Created on 2019-09-17 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 388797
apply cSplit
on "y"
column of each dataframe in filelist
lapply(filelist, splitstackshape::cSplit, "y")
#[[1]]
# x y_1 y_2 y_3
#1: 1 1 2 NA
#2: 2 1 2 3
#3: 3 1 5 NA
#[[2]]
# x y_1 y_2 y_3 y_4
#1: 4 1 2 NA NA
#2: 5 1 4 NA NA
#3: 6 1 6 7 8
Upvotes: 2