Reputation: 187
I have a data frame results from extracting data from text files which have some columns which contains more than a value
I want to split columns with more than a value into 2 columns like this
I tried this code but it generates an error
db<-separate_rows(db,TYPE,CHRO,EX ,sep=",\\s+")
Error: All nested columns must have the same number of elements.
Upvotes: 0
Views: 62
Reputation: 50738
Note that sample data and expected output don't match; for example, there is no CHRO=c700
entry in your sample data. You also seem to be missing rows. Please check your input/expected output data.
You could use tidyr::separate_rows
, e.g.
df %>%
separate_rows(TYPE, sep = ",") %>%
separate_rows(CHRO, sep = ",") %>%
separate_rows(EX, sep = ",")
# TYPE CHRO EX
#1 multiple c.211dup <NA>
#2 multiple c.3751dup <NA>
#3 multiple <NA> exon.2
#4 multiple <NA> exon.3
#5 multiple <NA> exon.7
#6 mitocondrial <NA> exon.3
#7 mitocondrial <NA> exon.7
#8 multifactorial <NA> <NA>
Or perhaps use splitstackshape
library(splitstackshape)
df %>%
cSplit(names(df), direction = "long") %>%
fill(TYPE) %>%
group_by_at(names(df)) %>%
slice(1)
# TYPE CHRO EX
# <fct> <fct> <fct>
#1 mitocondrial NA exon.7
#2 multifactorial NA NA
#3 multiple c.211dup NA
#4 multiple c.3751dup NA
#5 multiple NA exon.2
#6 multiple NA exon.3
#7 multiple NA NA
Note that results are different because the order of separating columns matters.
df <- read.table(text =
"TYPE CHRO EX
multiple 'c.211dup, c.3751dup' NA
multiple NA exon.2
multiple,mitocondrial NA exon.3,exon.7
multifactorial NA NA", header = T)
Upvotes: 1