Nadeem Hussain
Nadeem Hussain

Reputation: 219

How to separate comma separated values in R in a new row?

I have a dataset as follow:

col1    col2
a        1,2,3
b        ["1","2"]
c        4

I want the output as:

col1     col2
a         1
a         2
a         3
b         1
b         2
c         4

Is it possible to do so in R? If yes, how?

Upvotes: 2

Views: 4911

Answers (2)

guyabel
guyabel

Reputation: 8366

The separate_rows() function in tidyr is the boss for observations with multiple delimited values. As you have a mix of integer and character strings (but just want integers in the final result, set convert = TRUE and use the drop_na() (also in tidyr) to filter out the new rows for where the square parenthesis would otherwise go.

# create data 
library(tidyverse)
d <- data_frame(
  col1 = c("a", "b", "c"), 
  col2 = c("1,2,3", "[\"1\",\"2\"]", 4)
)
d
# # A tibble: 3 x 2
#    col1            col2
#   <chr>           <chr>
# 1     a           1,2,3
# 2     b "[\"1\",\"2\"]"
# 3     c               4

# tidy data
d %>%
  separate_rows(col2, convert = TRUE) %>%
  drop_na()
# # A tibble: 6 x 2
#    col1  col2
#   <chr> <int>
# 1     a     1
# 2     a     2
# 3     a     3
# 4     b     1
# 5     b     2
# 6     c     4

Upvotes: 5

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You could try cSplit from my "splitstackshape" package:

library(splitstackshape)
cSplit(as.data.table(mydf)[, col2 := gsub("[][\"]", "", col2)], 
       "col2", ",", "long")
#    col1 col2
# 1:    a    1
# 2:    a    2
# 3:    a    3
# 4:    b    1
# 5:    b    2
# 6:    c    4

Of course, I'm highly partial to cSplit, but you can also use "dplyr" and unnest from "tidyr":

library(dplyr)
library(tidyr)

mydf %>%
  mutate(col2 = strsplit(gsub("[][\"]", "", col2), ",")) %>%
  unnest(col2)

Or just with "data.table":

library(data.table)
as.data.table(mydf)[, list(
  col2 = unlist(strsplit(gsub("[][\"]", "", col2), ","))), 
  by = col1]

Upvotes: 11

Related Questions