Reputation: 1043
I have a dataframe that looks like the following:
Pos HG00096 HG00097 HG00099 ......and so on(about 2000 variables)
230765867 0|0 0|0 0|1 ......
230765926 1|0 0|1 0|0 ......
230765941 0|0 1|1 0|0 .......
and so on about 4000 observations
mydfexample <- tribble(
~Pos,~HG00096,~HG00097,~HG00099,
"230765867","0|0","0|0","0|1",
"230765926","1|0","0|1","0|0",
"230765941","0|0","1|1","0|0",
)
my goal is to separate each column into two (sep = "|") and concatenate them as one string for each column as shown below.
HG00096.1 010.....
HG000096.2 000.....
HG00097.1 001....
HG00097.2 011......
HG00099 .1 000......
HG00099.2 100.......
So far, I was able to separate them and put it into two separate columns (thanks to SO). Following is the code:
f= function(x){
mydfexample %>%
dplyr::select(Pos,x) %>%
separate(x, paste0(x,c(".1",".2")))
}
names(mydfexample[2:4]) %>%
map(f) %>%
reduce(left_join, by="Pos")
Next, my thought was to do a transpose and unite to get the final information. I was trying to follow How to transpose a dataframe in tidyverse? but that would change order of columns in the final dataframe. I want to keep the order of 0's and 1's. Right now it is sorted by position and would like to keep the same order. Any pointers or help is greatly appreciated.
Upvotes: 1
Views: 347
Reputation: 886938
We convert to 'long' format with pivot_longer
and separate
into two columns
library(dplyr)
library(tidyr)
mydfexample %>%
pivot_longer(cols = -Pos) %>%
separate(value, into = c('value1', 'value2'))
Based on the expected output showed
library(stringr)
mydfexample %>%
pivot_longer(cols = -Pos) %>%
separate(value, into = c('value1', 'value2')) %>%
group_by(name) %>%
summarise_at(vars(starts_with('value')), str_c, collapse="") %>%
pivot_longer(cols = -name, names_to = "Name") %>%
select(-Name) %>%
mutate(name = make.unique(name))
# A tibble: 6 x 2
# name value
# <chr> <chr>
#1 HG00096 010
#2 HG00096.1 000
#3 HG00097 001
#4 HG00097.1 011
#5 HG00099 000
#6 HG00099.1 100
Upvotes: 3