Reputation: 2583
suppose to have a table like this:
my gene my_value1 my_value2 Group A.1 0.987 0.234 0 B.1 0.1 0.2 0 A.2 0.11 0.45 1 C 0.22 0.98 1 B.2 0.4 0.12 1 D 0.21 0.67 2 B.3 0.56 0.6 2 E 0.4 0 2
and you want the following output
my gene my_value1_Group0 my_value2_Group0 my_value1_Group1 my_value2_Group1 my_value1_Group2 my_value2_Group2 A 0.987 0.234 0.11 0.45 NA NA B 0.1 0.2 0.4 0.12 0.56 0.6 C NA NA 0.22 0.98 NA NA D NA NA NA NA 0.21 0.67 E NA NA NA NA 0.4 0
Is there a way to format a table like the desired output?
Thank you in advance. Totally the data.frame contains 20.000 rows.
Upvotes: 1
Views: 49
Reputation: 887501
Here is an option with pivot_wider
. Remove the .
. followed by digits in the 'mygene' column, and use pivot_wider
from tidyr
to reshape multiple columns to 'wide' format
library(stringr)
library(dplyr)
library(tidyr)
df1 %>%
mutate(mygene = str_remove(mygene, "\\.\\d+"), Group = str_c("Group", Group)) %>%
pivot_wider(names_from = Group, values_from = c(my_value1, my_value2))
# A tibble: 5 x 7
# mygene my_value1_Group0 my_value1_Group1 my_value1_Group2 my_value2_Group0 my_value2_Group1 my_value2_Group2
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 0.987 0.11 NA 0.234 0.45 NA
#2 B 0.1 0.4 0.56 0.2 0.12 0.6
#3 C NA 0.22 NA NA 0.98 NA
#4 D NA NA 0.21 NA NA 0.67
#5 E NA NA 0.4 NA NA 0
df1 <- structure(list(mygene = c("A.1", "B.1", "A.2", "C", "B.2", "D",
"B.3", "E"), my_value1 = c(0.987, 0.1, 0.11, 0.22, 0.4, 0.21,
0.56, 0.4), my_value2 = c(0.234, 0.2, 0.45, 0.98, 0.12, 0.67,
0.6, 0), Group = c(0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L)),
class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 2