NewUsr_stat
NewUsr_stat

Reputation: 2583

Format an output table by groups

suppose to have a table like this:

 my gene       my_value1       my_value2       Group     
   A.1            0.987          0.234           0        
   B.1            0.1            0.2             0        
   A.2            0.11           0.45            1     
   C              0.22           0.98            1       
   B.2            0.4            0.12            1       
   D              0.21           0.67            2  
   B.3            0.56           0.6             2  
   E              0.4            0               2              

and you want the following output

 my gene       my_value1_Group0       my_value2_Group0     my_value1_Group1     my_value2_Group1     my_value1_Group2           my_value2_Group2   
   A               0.987                   0.234                  0.11                0.45                 NA                           NA
   B                0.1                    0.2                    0.4                 0.12                 0.56                        0.6   
   C                NA                     NA                     0.22                0.98                  NA                          NA 
   D                NA                     NA                     NA                   NA                  0.21                        0.67
   E                NA                     NA                     NA                   NA                   0.4                         0

Is there a way to format a table like the desired output?

Thank you in advance. Totally the data.frame contains 20.000 rows.

Upvotes: 1

Views: 49

Answers (1)

akrun
akrun

Reputation: 887501

Here is an option with pivot_wider. Remove the .. followed by digits in the 'mygene' column, and use pivot_wider from tidyr to reshape multiple columns to 'wide' format

library(stringr)
library(dplyr)
library(tidyr)
df1 %>% 
  mutate(mygene = str_remove(mygene, "\\.\\d+"), Group = str_c("Group", Group)) %>% 
  pivot_wider(names_from = Group, values_from = c(my_value1, my_value2))
# A tibble: 5 x 7
#  mygene my_value1_Group0 my_value1_Group1 my_value1_Group2 my_value2_Group0 my_value2_Group1 my_value2_Group2
#  <chr>             <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
#1 A                 0.987             0.11            NA               0.234             0.45            NA   
#2 B                 0.1               0.4              0.56            0.2               0.12             0.6 
#3 C                NA                 0.22            NA              NA                 0.98            NA   
#4 D                NA                NA                0.21           NA                NA                0.67
#5 E                NA                NA                0.4            NA                NA                0   

data

df1 <- structure(list(mygene = c("A.1", "B.1", "A.2", "C", "B.2", "D", 
"B.3", "E"), my_value1 = c(0.987, 0.1, 0.11, 0.22, 0.4, 0.21, 
0.56, 0.4), my_value2 = c(0.234, 0.2, 0.45, 0.98, 0.12, 0.67, 
0.6, 0), Group = c(0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L)), 
 class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 2

Related Questions