keegan
keegan

Reputation: 2992

Reshaping dataframes

Suppose I have this untidy dataframe with rownames

df<-data.frame(A=rnorm(2),B=rnorm(2),C=rnorm(2))
rownames(df)<-c('male','female')
> df                
            A         B          C
male   0.10138957  1.036535  2.0708251
female 0.08524181 -1.140275 -0.3800585

I'd like to get it into to tidy form such that the columns A, B, and C are collapsed into a single column and there's an additional column for Gender. It might look like this

> desired_df
   where    value   Gender
1    A  0.10138957   male
2    A  0.08524181 female
3    B  1.03653464   male
4    B -1.14027549 female
5    C  2.07082513   male
6    C -0.38005851 female

Here's one way to achieve this using tidyr.

desired_df<-gather(df,where,value)
desired_df['Gender']<-rep(c('male','female'),nrow(df)/2)

But I had to convert the row names into a column in a rather inelegant way. My question is - What additional functions in tidyr or reshape would have helped with this? Or could my call to gather have been done differently in order to achieve this result in one line?

Upvotes: 1

Views: 636

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

You can get to your "desired_df" in several ways. Here's one with "dplyr" and "tidyr" where we combine the steps a little more conveniently:

library(tidyr)
library(dplyr)
df %>% cbind(rn = rownames(df)) %>% gather(where, value, A:C)

Or, you can use the matrix method for melt from "reshape2":

library(reshape2)
melt(as.matrix(df))
#     Var1 Var2       value
# 1   male    A -1.58936590
# 2 female    A -2.11861101
# 3   male    B -0.04883463
# 4 female    B  1.77685001
# 5   male    C  0.22472189
# 6 female    C  1.35717680

In base R, you can use stack and just recycle the rownames:

cbind(rn = rownames(df), stack(df))
#       rn      values ind
# 1   male -1.58936590   A
# 2 female -2.11861101   A
# 3   male -0.04883463   B
# 4 female  1.77685001   B
# 5   male  0.22472189   C
# 6 female  1.35717680   C

Or, with some slight trickery, making use of the data.frame method for tables, you can do:

data.frame(as.table(as.matrix(df)))
#     Var1 Var2        Freq
# 1   male    A -1.58936590
# 2 female    A -2.11861101
# 3   male    B -0.04883463
# 4 female    B  1.77685001
# 5   male    C  0.22472189
# 6 female    C  1.35717680

Upvotes: 4

Related Questions