kaexch
kaexch

Reputation: 15

Column Split based on certain separator format

I have a format to separate where I will have this data:

df = data.frame(id=c(1,2),name=c('A~B~C','A~B~D'),value=c('1~2~3','1~~2'))
id   name      value
1    A~B~C     1~2~3
2    A~B~D     1~~2

which is expected to have the following output where the column name is the original column name followed by the text in the name column:

id   value_A   value_B   value_C  value_D
1      1         2          3      
2      1                            2

I manage to achieve the splitting for the by using many nested for loops to process on my data row by row. It works on small sample data but once the data gets huge, the time is an issue.

Also,there could be more than 1 value columns, but they all should map into the same name column. Example output:

id   value_A   value_B   value_C  value1_A   value1_B   value1_C
1      1         2          3       1           2          3
2      1         2          3       1           2          3   

Upvotes: 0

Views: 127

Answers (2)

Roman
Roman

Reputation: 17648

You can try dplyr:

library(tidyverse)
df %>% 
 separate_rows(name, value, sep = "~") %>% 
 spread(name, value)
  id A B    C    D
1  1 1 2    3 <NA>
2  2 1   <NA>    2

Instead of NA you can fill empty cells by anything you specify within fill = ""

Or baseR and reshape2:

a <- strsplit(as.character(df$name), "~")
b <- strsplit(as.character(df$value), "~")
df2 <- do.call(rbind.data.frame, Map(cbind, df$id, a, b))
library(reshape2)
dcast(df2, V1~V2, value.var = "V3")
  A B C    D   
1 1 2 3    <NA>
2 1   <NA> 2   

Upvotes: 2

akrun
akrun

Reputation: 887088

Here is an option using cSplit/dcast. Split the rows into 'long' format with cSplit and dcast it to 'wide' format

library(splitstackshape)
dcast(cSplit(df, c('name','value'), '~', 'long')[!is.na(value)], id ~ paste0('value_', name))
#   id value_A value_B value_C value_D
#1:  1       1       2       3      NA
#2:  2       1      NA      NA       2

Upvotes: 0

Related Questions