DirtStats
DirtStats

Reputation: 599

How to make data in a single column (long) with multiple, nested group categories wide

I've got a mess of data and am trying to efficiently wrangle it into shape. Here's a simplified short sample of the general format of my data.frame right now. The main difference is that I have a few more data labels like Label1 for my sampling units - each has a set of data similar to the data.frame I'm including but in my situation they are all in the same data.frame. I don't think that will complicate the reformatting so I've just included the single sampling unit of mock data here. StatsType levels Ave, Max, and Min are effectively nested within MeasureType.

tastycheez<-data.frame(
  Day=rep((1:3),9),
  StatsType=rep(c(rep("Ave",3),rep("Max",3),rep("Min",3)),3),
  MeasureType=rep(c("Temp","H2O","Tastiness"),each=9),
  Data_values=1:27,
  Label1=rep("SamplingU1",27))

Ultimately, I would like a data frame where for each sampling unit and each Day there are columns holding the Data_values for my categories, like this:

 Day     Label1   Ave.Temp Ave.H2O Ave.Tastiness  Max.Temp ...
 1   SamplingU1          1      10            19         4 ...
 2   SamplingU1          2      11            20         5 ...

I think some combination of functions from reshape,dplyr,tidyr, and/or data.table could do the job but I can't figure out how to code it. Here's what I've tried:

First, I spread the tastycheez (yum!), and that got me partway:

test<-spread(tastycheez,StatsType,Data_values)

Now I'm trying to spread it again or to cast, but with no luck:

test2<-spread(test,MeasureType,(Ave,Max,Min))

test2 <- recast(Day ~ MeasureType+c(Ave,Max,Min), data=test)

(I also tried melting the tastycheez but the results were a sticky, gooey mess and my tongue got burnt. that doesn't seem to be the right function for this.)

If you hate my puns please excuse them, I really can't figure this out!

Here are a couple related questions: Combining two subgroups of data in the same dataframe How can I spread repeated measures of multiple variables into wide format?

Upvotes: 2

Views: 224

Answers (1)

Frank
Frank

Reputation: 66819

reshape2 You could use dcast from reshape2:

library(reshape2)
dcast(tastycheez, 
  Day + Label1 ~ paste(StatsType, MeasureType, sep="."), 
  value.var = "Data_values")

which gives

  Day     Label1 Ave.H2O Ave.Tastiness Ave.Temp Max.H2O Max.Tastiness Max.Temp Min.H2O Min.Tastiness Min.Temp
1   1 SamplingU1      10            19        1      13            22        4      16            25        7
2   2 SamplingU1      11            20        2      14            23        5      17            26        8
3   3 SamplingU1      12            21        3      15            24        6      18            27        9

tidyr Stealing @DavidArenburg's comment, here's the tidyr way:

library(tidyr)
tastycheez %>%
       unite(temp, StatsType, MeasureType, sep = ".") %>% 
       spread(temp, Data_values)

which gives

  Day     Label1 Ave.H2O Ave.Tastiness Ave.Temp Max.H2O Max.Tastiness Max.Temp Min.H2O Min.Tastiness Min.Temp
1   1 SamplingU1      10            19        1      13            22        4      16            25        7
2   2 SamplingU1      11            20        2      14            23        5      17            26        8
3   3 SamplingU1      12            21        3      15            24        6      18            27        9

Upvotes: 4

Related Questions