Reputation:

Panel Data formation in R

I have a data set with large number of companies. And for each company I estimated a measure for each month over a period of time i.e. year. But now I want to restructure this data set into a panel data. Below I illustrate the data set I have.

Month   Beta    A   B
Jan-05  Beta1   1   32
Jan-05  Beta2   2   32
Jan-05  Beta3   3   32
Jan-05  Beta4   4   32
Jan-05  Beta5   5   32
Jan-05  Beta6   6   32
Feb-05  Beta1   7   32
Feb-05  Beta2   8   32
Feb-05  Beta3   9   32
Feb-05  Beta4   10  32
Feb-05  Beta5   11  32
Feb-05  Beta6   12  32
Mar-05  Beta1   13  32
Mar-05  Beta2   14  32
Mar-05  Beta3   15  32
Mar-05  Beta4   16  32
Mar-05  Beta5   17  32
Mar-05  Beta6   18  32
Apr-05  Beta1   19  32
Apr-05  Beta2   20  32
Apr-05  Beta3   21  32
Apr-05  Beta4   22  32
Apr-05  Beta5   23  32
Apr-05  Beta6   24  32
May-05  Beta1   25  32
May-05  Beta2   26  32
May-05  Beta3   27  32
May-05  Beta4   28  32
May-05  Beta5   29  32
May-05  Beta6   30  32
Jun-05  Beta1   31  32
Jun-05  Beta2   32  32
Jun-05  Beta3   33  32
Jun-05  Beta4   34  32
Jun-05  Beta5   35  32
Jun-05  Beta6   36  32
Jul-05  Beta1   37  32
Jul-05  Beta2   38  32
Jul-05  Beta3   39  32
Jul-05  Beta4   40  32
Jul-05  Beta5   41  32
Jul-05  Beta6   42  32
Aug-05  Beta1   43  32
Aug-05  Beta2   44  32
Aug-05  Beta3   45  32
Aug-05  Beta4   46  32
Aug-05  Beta5   47  32
Aug-05  Beta6   48  32
Sep-05  Beta1   49  32
Sep-05  Beta2   50  32
Sep-05  Beta3   51  32
Sep-05  Beta4   52  32
Sep-05  Beta5   53  32
Sep-05  Beta6   54  32
Oct-05  Beta1   55  32
Oct-05  Beta2   56  32
Oct-05  Beta3   57  32
Oct-05  Beta4   58  32
Oct-05  Beta5   59  32
Oct-05  Beta6   60  32
Nov-05  Beta1   61  32
Nov-05  Beta2   62  32
Nov-05  Beta3   63  32
Nov-05  Beta4   64  32
Nov-05  Beta5   65  32
Nov-05  Beta6   66  32
Dec-05  Beta1   67  32
Dec-05  Beta2   68  32
Dec-05  Beta3   69  32
Dec-05  Beta4   70  32
Dec-05  Beta5   71  32
Dec-05  Beta6   72  32

As it can be observed six separate measures Beta1, Beta2, Beta3, Beta4, Beta5 and Beta6 have been calculated for each month. Now I want to reshape this into as follows

Month   Company Beta1   Beta2   Beta3   Beta4   Beta5   Beta6
Jan-05  A        1         2      3       4        5    6
Feb-05  A        7         8      9       10       11   12
Mar-05  A       13        14      15      16       17   18
Apr-05  A       19        20      21      22       23   24
May-05  A       25        26      27      28       29   30
Jun-05  A       31        32      33      34       35   36
Jul-05  A       37        38      39      40       41   42
Aug-05  A       43        44      45      46       47   48
Sep-05  A       49        50      51      52       53   54
Oct-05  A       56        57      58      59       60   61
Nov-05  A       62        63       64     65       66   67
Dec-05  A       68        69       70      71     72    73
Jan-05  B       32        32       32      32     32    32
Feb-05  B       32        32       32      32     32    32
Mar-05  B       32        32       32      32     32    32
Apr-05  B       32        32       32      32     32    32
May-05  B       32        32       32      32     32    32
Jun-05  B       32        32       32      32     32    32
Jul-05  B       32        32       32      32     32    32
Aug-05  B       32        32       32      32     32    32
Sep-05  B       32        32       32      32     32    32
Oct-05  B       32        32       32      32     32    32
Nov-05  B       32        32       32      32     32    32
Dec-05  B       32        32       32      32     32    32

My original data set has over thousand columns. I have referred to enter link description here but my data set is different from this one. I would really appreciate your help in this regard.

Upvotes: 1

Answers (3)

user3356873

Reputation: 63

You can find the same answer in the Reshape pdf which is available online. Please go through this link.
https://cran.r-project.org/web/packages/reshape/reshape.pdf

https://cran.r-project.org/web/packages/reshape2/reshape2.pdf

# This is an example. 
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
cast(aqm, day ~ month ~ variable)
# Not only sorting there are many other possiblities. 
cast(aqm, month ~ variable, mean)
cast(aqm, month ~ . | variable, mean)
cast(aqm, month ~ variable, mean, margins=c("grand_row", "grand_col"))
cast(aqm, day ~ month, mean, subset=variable=="ozone")
cast(aqm, month ~ variable, range)
cast(aqm, month ~ variable + result_variable, range)
cast(aqm, variable ~ month ~ result_variable,range)

If you follow all the examples, there is your answer. Reshape is handy package. May this helps you.

Upvotes: 0

Pierre L

Reputation: 28451

This is a classic recast. The id.var argument goes to melt, the other arguments go to dcast.

The first function melt produces a long data frame specified with id.var=c("Month", "Beta"). Therefore the output will combine columns "A" and "B" in a column called variable and create a separate value column. With that melted data.frame We can then cast wide with dcast. The formula Month+variable ~ Beta has id variables on the Left of the tilde(~) and measure variables on the right. Both functions are combined in recast for convenience.

If the order matters, convert the Month column to dates and order at the end:

library(zoo)
library(reshape2)
df[,1] <- as.yearmon(df[,1], "%b-%y")
newdf <- recast(df, id.var=c("Month", "Beta"), Month+variable~Beta, value.var="value")
newdf <- newdf[order(newdf$variable,newdf$Month),]
newdf[,1] <- format(newdf$Month, "%b-%y")
newdf
#     Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
# 1  Jan-05        A     1     2     3     4     5     6
# 3  Feb-05        A     7     8     9    10    11    12
# 5  Mar-05        A    13    14    15    16    17    18
# 7  Apr-05        A    19    20    21    22    23    24
# 9  May-05        A    25    26    27    28    29    30
# 11 Jun-05        A    31    32    33    34    35    36
# 13 Jul-05        A    37    38    39    40    41    42
# 15 Aug-05        A    43    44    45    46    47    48
# 17 Sep-05        A    49    50    51    52    53    54
# 19 Oct-05        A    55    56    57    58    59    60
# 21 Nov-05        A    61    62    63    64    65    66
# 23 Dec-05        A    67    68    69    70    71    72
# 2  Jan-05        B    32    32    32    32    32    32
# 4  Feb-05        B    32    32    32    32    32    32
# 6  Mar-05        B    32    32    32    32    32    32
# 8  Apr-05        B    32    32    32    32    32    32
# 10 May-05        B    32    32    32    32    32    32
# 12 Jun-05        B    32    32    32    32    32    32
# 14 Jul-05        B    32    32    32    32    32    32
# 16 Aug-05        B    32    32    32    32    32    32
# 18 Sep-05        B    32    32    32    32    32    32
# 20 Oct-05        B    32    32    32    32    32    32
# 22 Nov-05        B    32    32    32    32    32    32
# 24 Dec-05        B    32    32    32    32    32    32

Upvotes: 3

Brandon Loudermilk

Reputation: 970

Using functions from package {tidyr}:

df <- data.frame(Month = c(rep("Jan-05", 6), rep("Feb-05",6)),  
                 Beta = c(paste0("Beta",1:6), paste0("Beta",1:6)), 
                 A  = 1:12,  
                 B=rep(32,12))

df2 <- tidyr::gather(df, key = Company, value = val, A, B)

df3 <- tidyr::spread(data = df2, key = Beta, value = val)

##   Month Company Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
##1 Feb-05       A     7     8     9    10    11    12
##2 Feb-05       B    32    32    32    32    32    32
##3 Jan-05       A     1     2     3     4     5     6
##4 Jan-05       B    32    32    32    32    32    32

Upvotes: 1

Panel Data formation in R

Answers (3)

Related Questions