Reputation: 6200
I have a data.frame that looks like this:
df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))
# col1 col2 col3
# 1 a 1 0
# 2 b 1;2;3 1;1;0
# 3 c 5 0
# 4 d 3;2;5;5;3 0;0;1;1;0
In words, some rows have columns with values concatenated by ";". Before reading in the data.frame I don't know which columns will contain the concatenated values but I do know that they will be the same ones for all those rows that have that. I also know that for a rows with columns with concatenated values the number of concatenated values is identical across all such columns (row 2 has 3 values in both col2 and col3, and row 4 has 5 values in these columns)
I would like to create a new data.frame where these concatenated values are split to separate rows. For those rows, the values in the columns that do not have concatenated values should be replicated by the number of concatenated values.
The resulting data.frame will be:
df <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))
# col1 col2 col3
# 1 a 1 0
# 2 b 1 1
# 3 b 2 1
# 4 b 3 0
# 5 c 5 0
# 6 d 3 0
# 7 d 2 0
# 8 d 5 1
# 9 d 5 1
# 10 d 3 0
Upvotes: 2
Views: 76
Reputation: 193527
This is the kind of data that I wrote my "splitstackshape" package for. You can use cSplit
, like this:
library(splitstackshape)
cSplit(df, c("col2", "col3"), ";", "long")
# col1 col2 col3
# 1: a 1 0
# 2: b 1 1
# 3: b 2 1
# 4: b 3 0
# 5: c 5 0
# 6: d 3 0
# 7: d 2 0
# 8: d 5 1
# 9: d 5 1
# 10: d 3 0
Upvotes: 1
Reputation: 6784
Not as sophisticated as rawr's answer, but perhaps easier to see what is happening
df1 <- data.frame(col1=c("a","b","c","d"),
col2=c("1","1;2;3","5","3;2;5;5;3"),
col3=c("0","1;1;0","0","0;0;1;1;0"),
stringsAsFactors=FALSE)
df1_rows <- nrow(df1)
col1_split <- strsplit(df1$col1,";")
col2_split <- strsplit(df1$col2,";")
col3_split <- strsplit(df1$col3,";")
df2 <- data.frame(col1=character(),
col2=character(),
col3=character(),
stringsAsFactors=FALSE)
for (n in 1:df1_rows){ df2 <- rbind(df2,
data.frame(col1=col1_split[[n]],
col2=col2_split[[n]],
col3=col3_split[[n]],
stringsAsFactors=FALSE))}
which gives
> df2
col1 col2 col3
1 a 1 0
2 b 1 1
3 b 2 1
4 b 3 0
5 c 5 0
6 d 3 0
7 d 2 0
8 d 5 1
9 d 5 1
10 d 3 0
Upvotes: 0
Reputation: 20811
here's one option
df <- data.frame(col1=c("a","b","c","d"), col2=c("1","1;2;3","5","3;2;5;5;3"), col3=c("0","1;1;0","0","0;0;1;1;0"))
df2 <- data.frame(col1=c("a","b","b","b","c","d","d","d","d","d"), col2=c("1","1","2","3","5","3","2","5","5","3"), col3=c("0","1","1","0","0","0","0","1","1","0"))
## reshape `col1` to make it look like the others
v <- Vectorize(gsub)
df$col1 <- v('\\b\\d\\b', df$col1, df$col2)
# col1 col2 col3
# 1 a 1 0
# 2 b;b;b 1;2;3 1;1;0
# 3 c 5 0
# 4 d;d;d;d;d 3;2;5;5;3 0;0;1;1;0
## split on white space or `;` and coerce back into a data frame
data.frame(do.call('cbind', lapply(df, function(x)
unlist(strsplit(as.character(x), '[\\s;]')))))
# col1 col2 col3
# 1 a 1 0
# 2 b 1 1
# 3 b 2 1
# 4 b 3 0
# 5 c 5 0
# 6 d 3 0
# 7 d 2 0
# 8 d 5 1
# 9 d 5 1
# 10 d 3 0
Upvotes: 2