Reputation: 4790
I have a dataframe of sequences like this
dput(df)
structure(list(val = structure(c(3L, 2L, 4L, 1L, 5L, 6L), .Label = c("{36415},{36415}",
"{36415},{85610}", "{36415},{9904}", "{85025,36415}", "{85610},{36415}",
"{8872},{36415}"), class = "factor")), .Names = "val", row.names = c(NA,
-6L), class = "data.frame")
df
val
1 {36415},{9904}
2 {36415},{85610}
3 {85025,36415}
4 {36415},{36415}
5 {85610},{36415}
6 {8872},{36415}
Notice the 3rd row above. The first row says there is a sequence item 1 followed by item 2 in different rows. The 3rd row says item1 and 2 belong to same row in the sequence
I want to break this data frame into columns like this
col1 col2
36415 9904
36415 85610
85025,36415 NA
36415 36415
...
Notice how the 3rd row of the data frame is.
Is there any way to achieve this?
Upvotes: 1
Views: 170
Reputation: 18691
Here's a one-liner with extract
from tidyr
. This uses capture groups to specify the column patterns:
library(tidyr)
extract(df, "val", c("col1", "col2"), regex = "\\{([\\d,]+)(?:\\},\\{)?([\\d,]+)?\\}")
or with str_match
from stringr
. This uses the exact same regex:
library(stringr)
data.frame(str_match(df$val, "\\{([\\d,]+)(?:\\},\\{)?([\\d,]+)?\\}")[,-1])
Result:
col1 col2
1 36415 9904
2 36415 85610
3 85025,36415 <NA>
4 36415 36415
5 85610 36415
6 8872 36415
X1 X2
1 36415 9904
2 36415 85610
3 85025,36415 <NA>
4 36415 36415
5 85610 36415
6 8872 36415
Upvotes: 1
Reputation: 39174
A solution from dplyr
and tidyr
. We can separate the column and then remove any {
or }
.
library(dplyr)
library(tidyr)
df2 <- df %>%
separate(val, into = c("col1", "col2"), sep = "\\},\\{", fill = "right") %>%
mutate_all(funs(gsub("\\{|\\}", "", .)))
df2
# col1 col2
# 1 36415 9904
# 2 36415 85610
# 3 85025,36415 <NA>
# 4 36415 36415
# 5 85610 36415
# 6 8872 36415
Upvotes: 1
Reputation: 146070
library(tidyr)
df = separate(df, col = val, into = c("col1", "col2"), sep = "\\},\\{", fill = "right")
df[] = lapply(df, gsub, pattern = "\\{|\\}", replacement = "")
df
# col1 col2
# 1 36415 9904
# 2 36415 85610
# 3 85025,36415 <NA>
# 4 36415 36415
# 5 85610 36415
# 6 8872 36415
Upvotes: 2