Reputation: 15
I have a format to separate where I will have this data:
df = data.frame(id=c(1,2),name=c('A~B~C','A~B~D'),value=c('1~2~3','1~~2'))
id name value
1 A~B~C 1~2~3
2 A~B~D 1~~2
which is expected to have the following output where the column name is the original column name followed by the text in the name column:
id value_A value_B value_C value_D
1 1 2 3
2 1 2
I manage to achieve the splitting for the by using many nested for loops to process on my data row by row. It works on small sample data but once the data gets huge, the time is an issue.
Also,there could be more than 1 value columns, but they all should map into the same name column. Example output:
id value_A value_B value_C value1_A value1_B value1_C
1 1 2 3 1 2 3
2 1 2 3 1 2 3
Upvotes: 0
Views: 127
Reputation: 17648
You can try dplyr
:
library(tidyverse)
df %>%
separate_rows(name, value, sep = "~") %>%
spread(name, value)
id A B C D
1 1 1 2 3 <NA>
2 2 1 <NA> 2
Instead of NA
you can fill empty cells by anything you specify within fill = ""
Or baseR and reshape2
:
a <- strsplit(as.character(df$name), "~")
b <- strsplit(as.character(df$value), "~")
df2 <- do.call(rbind.data.frame, Map(cbind, df$id, a, b))
library(reshape2)
dcast(df2, V1~V2, value.var = "V3")
A B C D
1 1 2 3 <NA>
2 1 <NA> 2
Upvotes: 2
Reputation: 887088
Here is an option using cSplit/dcast
. Split the rows into 'long' format with cSplit
and dcast
it to 'wide' format
library(splitstackshape)
dcast(cSplit(df, c('name','value'), '~', 'long')[!is.na(value)], id ~ paste0('value_', name))
# id value_A value_B value_C value_D
#1: 1 1 2 3 NA
#2: 2 1 NA NA 2
Upvotes: 0