jrara
jrara

Reputation: 16981

How to reshape data into wide format?

I have a data set like this:

structure(list(var1 = c("APE", "APE", "APE", "APE", "APE", "APE", "GIT",
"APE", "APE", "APE", "APE", "APE", "APE", "APE", "GIT"), var2 = c("AVVAL",
"AULASU", "APALA", "AEA", "ATUPVA", "ASATAP", "ADLO", "AKOKU", "AVVAL",
"AULASU", "APALA", "AEA", "ATUPVA", "ASATAP", "ADLO"), var3 = c(NA,
NA, 1L, 101L, 17122009L, 1L, NA, 684L, NA, NA, 1L, 10L, 17122L,
1L, NA)), .Names = c("var1", "var2", "var3"), row.names = c(NA,
15L), class = "data.frame")

How can I reshape this data into wide format? I tried this

reshape(h, idvar="var2", v.names="var3", timevar="var1", direction="wide")

but it is not giving me a correct results. The correct result is:

  var1 ADLO AEA AKOKU APALA ASATAP   ATUPVA AULASU AVVAL
1  APE  NaN 101   NA      1      1 17122009     NA    NA
2  APE  NaN  10   684     1      1    17122     NA    NA
3  GIT   NA NaN   NaN   NaN    NaN      NaN    NaN   NaN
4  GIT   NA NaN   NaN   NaN    NaN      NaN    NaN   NaN

Upvotes: 0

Views: 573

Answers (2)

Oscar Perpiñán
Oscar Perpiñán

Reputation: 4511

I add a new variable due to the meaning of GIT:

 dat$id <- cumsum(dat$var1=='GIT')

First I do the aggregation:

datMean <- aggregate(var3 ~ var2 * id, data=dat, FUN=mean)

> datMean
    var2 id     var3
1    AEA  0      101
2  APALA  0        1
3 ASATAP  0        1
4 ATUPVA  0 17122009
5    AEA  1       10
6  AKOKU  1      684
7  APALA  1        1
8 ASATAP  1        1
9 ATUPVA  1    17122

and then the conversion from long to wide:

datWide <- reshape(datMean, direction='wide', idvar='id', timevar='var2')

> datWide
  id var3.AEA var3.APALA var3.ASATAP var3.ATUPVA var3.AKOKU
1  0      101          1           1    17122009         NA
5  1       10          1           1       17122        684

Upvotes: 0

Andrie
Andrie

Reputation: 179388

Edited

The only way I can get to your expected results is to add a new column to the data.frame. It seems to me that there is some information implicit about your data that isn't contained in the data. In other words, there must be some kind of grouping variable that identifies certain records as belonging together.

Since I can't double-guess what this information is, in my answer I am going to assume that each occurrence of GIT marks the end of a record:

x <- grep("GIT", h$var1)
h$rec <- rep(seq_along(x), times=c(x[1], diff(x)))

library(reshape2)
mh <- melt(h, measure.vars="var3")
cast(mh, rec+var1~var2, id.var="rec", measure.var="value", fun.aggregate=mean)

  rec var1 ADLO AEA AKOKU APALA ASATAP   ATUPVA AULASU AVVAL
1   1  APE  NaN  10   NaN     1      1 17122009     NA    NA
2   1  GIT   NA NaN   NaN   NaN    NaN      NaN    NaN   NaN
3   2  APE  NaN  10   684     1      1 17122009     NA    NA
4   2  GIT   NA NaN   NaN   NaN    NaN      NaN    NaN   Na

Original answer

I find the package reshape2 much easier to comprehend than the built-in reshape function. This package provides two functions:

  • melt to make a wide data.frame tall
  • cast to make a tall data.frame wide

In your case you need cast:

library(reshape2)
cast(h, var1~var2, value="var3", fun.aggregate=mean)

  var1 ADLO AEA AKOKU APALA ASATAP   ATUPVA AULASU AVVAL
1  APE  NaN  10   684     1      1 17122009     NA    NA
2  GIT   NA NaN   NaN   NaN    NaN      NaN    NaN   NaN

Upvotes: 3

Related Questions