user1987607
user1987607

Reputation: 2157

split columns by character in list of data frames

I have the following list of data frames:

df1 <- data.frame(x = 1:3, y=c("1,2","1,2,3","1,5"))
df2 <- data.frame(x = 4:6, y=c("1,2","1,4","1,6,7,8"))
filelist <- list(df1,df2)

> filelist
[[1]]
     x     y
   1 1   1,2
   2 2 1,2,3
   3 3   1,5

[[2]]
    x       y
  1 4     1,2
  2 5     1,4
  3 6 1,6,7,8

Now I want to split each column 'y' by character ',' and store the output in new columns in the dataframe.

The output should look like this:

> filelist
[[1]]
   x     y_ref   y_alt1    y_alt2
1  1         1        2
2  2         1        2         3
3  3         1        5

[[2]]
   x     y_ref   y_alt2    y_alt3     y_alt4
 1 4         1        2
 2 5         1        4
 3 6         1        6         7          8

How should I do this? I know there is 'strsplit' to split a string by character. But I don't see how I can store the output then in different columns.

Upvotes: 1

Views: 107

Answers (3)

Fnguyen
Fnguyen

Reputation: 1177

Also with dplyr you can use separate() like this:

df %>%
separate(y, into = c(y_ind,y_alt1,...), sep = ",")

Note that into can also be used more "programmatically" to generate the needed amount of resulting columns with a proper indexing without manually defining each result column.

Upvotes: 0

Cole
Cole

Reputation: 11255

Here's a solution that relies on tstrsplit from

library(data.table)
lapply(filelist,
       function(DF) {
         commas = max(nchar(as.character(DF$y)) -nchar( gsub(",", "", DF$y)))
         DF[, c('y_ind', paste0('y_alt', seq_len(commas)))] = tstrsplit(as.character(DF$y), ',')
         DF
       })
#> [[1]]
#>   x     y y_ind y_alt1 y_alt2
#> 1 1   1,2     1      2   <NA>
#> 2 2 1,2,3     1      2      3
#> 3 3   1,5     1      5   <NA>
#> 
#> [[2]]
#>   x       y y_ind y_alt1 y_alt2 y_alt3
#> 1 4     1,2     1      2   <NA>   <NA>
#> 2 5     1,4     1      4   <NA>   <NA>
#> 3 6 1,6,7,8     1      6      7      8

Created on 2019-09-17 by the reprex package (v0.3.0)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388797

apply cSplit on "y" column of each dataframe in filelist

lapply(filelist, splitstackshape::cSplit, "y")

#[[1]]
#   x y_1 y_2 y_3
#1: 1   1   2  NA
#2: 2   1   2   3
#3: 3   1   5  NA

#[[2]]
#   x y_1 y_2 y_3 y_4
#1: 4   1   2  NA  NA
#2: 5   1   4  NA  NA
#3: 6   1   6   7   8

Upvotes: 2

Related Questions