R separate into multiple columns, transpose and concatenate

Question

I have a dataframe that looks like the following:

Pos HG00096 HG00097 HG00099 ......and so on(about 2000 variables)
230765867 0|0 0|0 0|1 ......
230765926 1|0 0|1 0|0 ......
230765941 0|0 1|1 0|0 .......

and so on about 4000 observations

mydfexample <- tribble(
  ~Pos,~HG00096,~HG00097,~HG00099,
  "230765867","0|0","0|0","0|1",
  "230765926","1|0","0|1","0|0",
  "230765941","0|0","1|1","0|0",
  )

my goal is to separate each column into two (sep = "|") and concatenate them as one string for each column as shown below.

HG00096.1 010.....
HG000096.2 000.....
HG00097.1 001....
HG00097.2 011......
HG00099 .1 000......
HG00099.2 100.......

So far, I was able to separate them and put it into two separate columns (thanks to SO). Following is the code:

f= function(x){
  mydfexample %>% 
    dplyr::select(Pos,x) %>%
    separate(x, paste0(x,c(".1",".2")))
}

names(mydfexample[2:4]) %>% 
  map(f) %>%
  reduce(left_join, by="Pos")

Next, my thought was to do a transpose and unite to get the final information. I was trying to follow How to transpose a dataframe in tidyverse? but that would change order of columns in the final dataframe. I want to keep the order of 0's and 1's. Right now it is sorted by position and would like to keep the same order. Any pointers or help is greatly appreciated.

akrun · Accepted Answer

We convert to 'long' format with pivot_longer and separate into two columns

library(dplyr)
library(tidyr)
mydfexample %>% 
    pivot_longer(cols = -Pos) %>% 
    separate(value, into = c('value1', 'value2'))

Based on the expected output showed

library(stringr)
mydfexample %>% 
         pivot_longer(cols = -Pos)  %>%
          separate(value, into = c('value1', 'value2')) %>% 
          group_by(name) %>% 
          summarise_at(vars(starts_with('value')), str_c, collapse="") %>%
          pivot_longer(cols = -name, names_to = "Name") %>% 
          select(-Name) %>% 
          mutate(name = make.unique(name))
# A tibble: 6 x 2
#  name      value
#       
#1 HG00096   010  
#2 HG00096.1 000  
#3 HG00097   001  
#4 HG00097.1 011  
#5 HG00099   000  
#6 HG00099.1 100

R separate into multiple columns, transpose and concatenate

Answers (1)

Related Questions