separte columns from a data frame

Question

I have a data frame results from extracting data from text files which have some columns which contains more than a value

I want to split columns with more than a value into 2 columns like this

I tried this code but it generates an error

db<-separate_rows(db,TYPE,CHRO,EX ,sep=",\s+")
Error: All nested columns must have the same number of elements.

Maurits Evers · Accepted Answer

Note that sample data and expected output don't match; for example, there is no CHRO=c700 entry in your sample data. You also seem to be missing rows. Please check your input/expected output data.

You could use tidyr::separate_rows, e.g.

df %>%
    separate_rows(TYPE, sep = ",") %>%
    separate_rows(CHRO, sep = ",") %>%
    separate_rows(EX, sep = ",")
#    TYPE       CHRO     EX
#1       multiple   c.211dup   
#2       multiple  c.3751dup   
#3       multiple        exon.2
#4       multiple        exon.3
#5       multiple        exon.7
#6   mitocondrial        exon.3
#7   mitocondrial        exon.7
#8 multifactorial

Or perhaps use splitstackshape

library(splitstackshape)
df %>%
    cSplit(names(df), direction = "long") %>%
    fill(TYPE) %>%
    group_by_at(names(df)) %>%
    slice(1)
#  TYPE           CHRO      EX
#                 
#1 mitocondrial   NA        exon.7
#2 multifactorial NA        NA
#3 multiple       c.211dup  NA
#4 multiple       c.3751dup NA
#5 multiple       NA        exon.2
#6 multiple       NA        exon.3
#7 multiple       NA        NA

Note that results are different because the order of separating columns matters.

Sample data

df <- read.table(text =
    "TYPE                   CHRO                       EX
        multiple    'c.211dup, c.3751dup'                       NA
        multiple                     NA                   exon.2
        multiple,mitocondrial        NA                   exon.3,exon.7
  multifactorial                     NA                       NA", header = T)

separte columns from a data frame

Answers (1)

Sample data

Related Questions