CodeLearner
CodeLearner

Reputation: 439

restructure data frame in R

I'm wondering if there is an easy way to restructure some data I have. I currently have a data frame that looks like this...

Year    Cat   Number
2001    A     15
2001    B     2
2002    A     4
2002    B     12

But what I ultimately want is to have it in this shape...

Year    Cat    Number    Cat    Number
2001    A      15        B      2
2002    A      4         B      12

Is there a simple way to do this?

Thanks in advance

:)

Upvotes: 2

Views: 1221

Answers (1)

akrun
akrun

Reputation: 886938

One way would be to use dcast/melt from reshape2. In the below code, first I created a sequence of numbers (indx column) for each Year by using transform and ave. Then, melt the transformed dataset keeping id.var as Year, and indx. The long format dataset is then reshaped to wide format using dcast. If you don't need the suffix _number, you can use gsub to remove that part.

library(reshape2)
res <- dcast(melt(transform(df, indx=ave(seq_along(Year), Year, FUN=seq_along)),
        id.var=c("Year", "indx")), Year~variable+indx, value.var="value")
colnames(res) <- gsub("\\_.*", "", colnames(res))
res
#   Year Cat Cat Number Number
#1 2001  A     B   15      2
#2 2002  A     B   4      12

Or using dplyr/tidyr. Here, the idea is similar as above. After grouping by Year column, generate a indx column using mutate, then reshape to long format with gather, unite two columns to a single column VarIndx and then reshape back to wide format with spread. In the last step mutate_each, columns with names that start with Number are converted to numeric column.

library(dplyr)
library(tidyr)

res1 <-  df %>% 
             group_by(Year) %>%
             mutate(indx=row_number()) %>% 
             gather("Var", "Val", Cat:Number) %>%
             unite(VarIndx, Var, indx) %>%
             spread(VarIndx, Val) %>%
             mutate_each(funs(as.numeric), starts_with("Number")) 
 
 res1
 #  Source: local data frame [2 x 5]

  #  Year Cat_1 Cat_2 Number_1 Number_2
  #1 2001     A     B       15        2
  #2 2002     A     B        4       12
        

Or you can create an indx variable .id using getanID from splitstackshape (from comments made by @Ananda Mahto (author of splitstackshape) and use reshape from base R

  library(splitstackshape)
  reshape(getanID(df, "Year"), direction="wide", idvar="Year", timevar=".id")
  #   Year Cat.1 Number.1 Cat.2 Number.2
  #1: 2001     A       15     B        2
  #2: 2002     A        4     B       12

data

df <-   structure(list(Year = c(2001L, 2001L, 2002L, 2002L), Cat = c("A", 
"B", "A", "B"), Number = c(15L, 2L, 4L, 12L)), .Names = c("Year", 
 "Cat", "Number"), class = "data.frame", row.names = c(NA, -4L
 ))

Upvotes: 2

Related Questions