Reputation:
I have a data set with large number of companies. And for each company I estimated a measure for each month over a period of time i.e. year. But now I want to restructure this data set into a panel data. Below I illustrate the data set I have.
Month Beta A B
Jan-05 Beta1 1 32
Jan-05 Beta2 2 32
Jan-05 Beta3 3 32
Jan-05 Beta4 4 32
Jan-05 Beta5 5 32
Jan-05 Beta6 6 32
Feb-05 Beta1 7 32
Feb-05 Beta2 8 32
Feb-05 Beta3 9 32
Feb-05 Beta4 10 32
Feb-05 Beta5 11 32
Feb-05 Beta6 12 32
Mar-05 Beta1 13 32
Mar-05 Beta2 14 32
Mar-05 Beta3 15 32
Mar-05 Beta4 16 32
Mar-05 Beta5 17 32
Mar-05 Beta6 18 32
Apr-05 Beta1 19 32
Apr-05 Beta2 20 32
Apr-05 Beta3 21 32
Apr-05 Beta4 22 32
Apr-05 Beta5 23 32
Apr-05 Beta6 24 32
May-05 Beta1 25 32
May-05 Beta2 26 32
May-05 Beta3 27 32
May-05 Beta4 28 32
May-05 Beta5 29 32
May-05 Beta6 30 32
Jun-05 Beta1 31 32
Jun-05 Beta2 32 32
Jun-05 Beta3 33 32
Jun-05 Beta4 34 32
Jun-05 Beta5 35 32
Jun-05 Beta6 36 32
Jul-05 Beta1 37 32
Jul-05 Beta2 38 32
Jul-05 Beta3 39 32
Jul-05 Beta4 40 32
Jul-05 Beta5 41 32
Jul-05 Beta6 42 32
Aug-05 Beta1 43 32
Aug-05 Beta2 44 32
Aug-05 Beta3 45 32
Aug-05 Beta4 46 32
Aug-05 Beta5 47 32
Aug-05 Beta6 48 32
Sep-05 Beta1 49 32
Sep-05 Beta2 50 32
Sep-05 Beta3 51 32
Sep-05 Beta4 52 32
Sep-05 Beta5 53 32
Sep-05 Beta6 54 32
Oct-05 Beta1 55 32
Oct-05 Beta2 56 32
Oct-05 Beta3 57 32
Oct-05 Beta4 58 32
Oct-05 Beta5 59 32
Oct-05 Beta6 60 32
Nov-05 Beta1 61 32
Nov-05 Beta2 62 32
Nov-05 Beta3 63 32
Nov-05 Beta4 64 32
Nov-05 Beta5 65 32
Nov-05 Beta6 66 32
Dec-05 Beta1 67 32
Dec-05 Beta2 68 32
Dec-05 Beta3 69 32
Dec-05 Beta4 70 32
Dec-05 Beta5 71 32
Dec-05 Beta6 72 32
As it can be observed six separate measures Beta1, Beta2, Beta3, Beta4, Beta5 and Beta6 have been calculated for each month. Now I want to reshape this into as follows
Month Company Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
Jan-05 A 1 2 3 4 5 6
Feb-05 A 7 8 9 10 11 12
Mar-05 A 13 14 15 16 17 18
Apr-05 A 19 20 21 22 23 24
May-05 A 25 26 27 28 29 30
Jun-05 A 31 32 33 34 35 36
Jul-05 A 37 38 39 40 41 42
Aug-05 A 43 44 45 46 47 48
Sep-05 A 49 50 51 52 53 54
Oct-05 A 56 57 58 59 60 61
Nov-05 A 62 63 64 65 66 67
Dec-05 A 68 69 70 71 72 73
Jan-05 B 32 32 32 32 32 32
Feb-05 B 32 32 32 32 32 32
Mar-05 B 32 32 32 32 32 32
Apr-05 B 32 32 32 32 32 32
May-05 B 32 32 32 32 32 32
Jun-05 B 32 32 32 32 32 32
Jul-05 B 32 32 32 32 32 32
Aug-05 B 32 32 32 32 32 32
Sep-05 B 32 32 32 32 32 32
Oct-05 B 32 32 32 32 32 32
Nov-05 B 32 32 32 32 32 32
Dec-05 B 32 32 32 32 32 32
My original data set has over thousand columns. I have referred to enter link description here but my data set is different from this one. I would really appreciate your help in this regard.
Upvotes: 1
Views: 432
Reputation: 63
You can find the same answer in the Reshape pdf which is available online.
Please go through this link.
https://cran.r-project.org/web/packages/reshape/reshape.pdf
https://cran.r-project.org/web/packages/reshape2/reshape2.pdf
# This is an example.
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
cast(aqm, day ~ month ~ variable)
# Not only sorting there are many other possiblities.
cast(aqm, month ~ variable, mean)
cast(aqm, month ~ . | variable, mean)
cast(aqm, month ~ variable, mean, margins=c("grand_row", "grand_col"))
cast(aqm, day ~ month, mean, subset=variable=="ozone")
cast(aqm, month ~ variable, range)
cast(aqm, month ~ variable + result_variable, range)
cast(aqm, variable ~ month ~ result_variable,range)
If you follow all the examples, there is your answer. Reshape is handy package. May this helps you.
Upvotes: 0
Reputation: 28451
This is a classic recast
. The id.var
argument goes to melt
, the other arguments go to dcast
.
The first function melt
produces a long data frame specified with id.var=c("Month", "Beta")
. Therefore the output will combine columns "A" and "B" in a column called variable
and create a separate value
column. With that melted data.frame
We can then cast wide with dcast
. The formula Month+variable ~ Beta
has id variables on the Left of the tilde(~
) and measure variables on the right. Both functions are combined in recast
for convenience.
If the order matters, convert the Month column to dates and order at the end:
library(zoo)
library(reshape2)
df[,1] <- as.yearmon(df[,1], "%b-%y")
newdf <- recast(df, id.var=c("Month", "Beta"), Month+variable~Beta, value.var="value")
newdf <- newdf[order(newdf$variable,newdf$Month),]
newdf[,1] <- format(newdf$Month, "%b-%y")
newdf
# Month variable Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
# 1 Jan-05 A 1 2 3 4 5 6
# 3 Feb-05 A 7 8 9 10 11 12
# 5 Mar-05 A 13 14 15 16 17 18
# 7 Apr-05 A 19 20 21 22 23 24
# 9 May-05 A 25 26 27 28 29 30
# 11 Jun-05 A 31 32 33 34 35 36
# 13 Jul-05 A 37 38 39 40 41 42
# 15 Aug-05 A 43 44 45 46 47 48
# 17 Sep-05 A 49 50 51 52 53 54
# 19 Oct-05 A 55 56 57 58 59 60
# 21 Nov-05 A 61 62 63 64 65 66
# 23 Dec-05 A 67 68 69 70 71 72
# 2 Jan-05 B 32 32 32 32 32 32
# 4 Feb-05 B 32 32 32 32 32 32
# 6 Mar-05 B 32 32 32 32 32 32
# 8 Apr-05 B 32 32 32 32 32 32
# 10 May-05 B 32 32 32 32 32 32
# 12 Jun-05 B 32 32 32 32 32 32
# 14 Jul-05 B 32 32 32 32 32 32
# 16 Aug-05 B 32 32 32 32 32 32
# 18 Sep-05 B 32 32 32 32 32 32
# 20 Oct-05 B 32 32 32 32 32 32
# 22 Nov-05 B 32 32 32 32 32 32
# 24 Dec-05 B 32 32 32 32 32 32
Upvotes: 3
Reputation: 970
Using functions from package {tidyr}:
df <- data.frame(Month = c(rep("Jan-05", 6), rep("Feb-05",6)),
Beta = c(paste0("Beta",1:6), paste0("Beta",1:6)),
A = 1:12,
B=rep(32,12))
df2 <- tidyr::gather(df, key = Company, value = val, A, B)
df3 <- tidyr::spread(data = df2, key = Beta, value = val)
## Month Company Beta1 Beta2 Beta3 Beta4 Beta5 Beta6
##1 Feb-05 A 7 8 9 10 11 12
##2 Feb-05 B 32 32 32 32 32 32
##3 Jan-05 A 1 2 3 4 5 6
##4 Jan-05 B 32 32 32 32 32 32
Upvotes: 1