Reputation: 11
I am trying to create a panel dataset for multiple variables across years. My data is in the following format in RStudio:
2012 2012 2012 2012 2013 2013 2013 .....
Name Var1 Var2 Var3......Var10 Var1 Var2...Var10.....
X 1 4 20 .......
Y 2 7 25 ......
Z 3 9 26 ...... and so on for each variable across years
I want to create a panel dataset for these that looks like
Name Year Var1 Var2 Var3....Var10
X 2012
Y 2012
Z 2012
X 2013
Y 2013
Z 2013
I have created the panel in the following steps:
Merged top two rows so the variables read as - 2012.Var1, 2012.Var2
Used the gather function to create a separate dataset for each variable as
Name Year Var1
X 2012.Var1 1
Y 2012.Var2 2
X 2013.Var1 and so on
I then combined the sheets to form one dataframe in the desired format
For step 1:
colnames(df) <- paste(colnames(df), df[1, ])
For step 2:
Var1 <- df %>% gather(year, Var1, `2012.Var1`,`2012.Var2`and so on)
Creating a Dataset for Var1
- did the same for all the variables
For step 3:
new_data = cbind(Var1,Var2, ..., Var10) # Combining Different Datasets
names(new_data)
While the code worked to create the dataframe, it is however, extremely tedious and time consuming when the variables are large in number and spread across years. I'm looking for an easier way to create the same dataframe?
Upvotes: 1
Views: 209
Reputation: 141
I don’t have a solution using gather
but you can achieve this in one shot by using merge.stack
function from splitstackshape
package. https://cran.r-project.org/web/packages/splitstackshape/splitstackshape.pdf#page18
I will add a simple example:
library(splitstackshape)
set.seed(123)
dt1 <- data.frame(id = LETTERS[1:10],
var1_2012 = runif(10, min = 1, max = 10),
var2_2012 = runif(10, min = 1, max = 10),
var1_2013 = runif(10, min = 1, max = 10),
var2_2013 = runif(10, min = 1, max = 10))
print(dt1)
id var1_2012 var2_2012 var1_2013 var2_2013
1 A 3.588198 9.611500 9.005854 9.667218
2 B 8.094746 5.080007 7.235231 9.120691
3 C 4.680792 7.098136 6.764561 7.216348
4 D 8.947157 6.153701 9.948428 8.159207
5 E 9.464206 1.926322 6.901352 1.221523
6 F 1.410008 9.098425 7.376774 5.300164
7 G 5.752949 3.214790 5.896594 7.826136
8 H 9.031771 1.378536 6.347278 2.947671
9 I 5.962915 3.951286 3.602438 3.863629
10 J 5.109533 9.590533 2.324023 3.084632
merge.stack
to get it in the format that you want. Here var.stubs
is the vector of the variable prefixes that you want to retain in columns. sep
argument specifies the separator that you want to use split the variable names. dt2 <- splitstackshape::merged.stack(dt1,
id.vars = "id",
var.stubs = c("var1", "var2"),
sep = "_",
keep.all = TRUE)
print(dt2)
id .time_1 var1 var2
1: A 2012 3.588198 9.611500
2: A 2013 9.005854 9.667218
3: B 2012 8.094746 5.080007
4: B 2013 7.235231 9.120691
5: C 2012 4.680792 7.098136
6: C 2013 6.764561 7.216348
7: D 2012 8.947157 6.153701
8: D 2013 9.948428 8.159207
9: E 2012 9.464206 1.926322
10: E 2013 6.901352 1.221523
11: F 2012 1.410008 9.098425
12: F 2013 7.376774 5.300164
13: G 2012 5.752949 3.214790
14: G 2013 5.896594 7.826136
15: H 2012 9.031771 1.378536
16: H 2013 6.347278 2.947671
17: I 2012 5.962915 3.951286
18: I 2013 3.602438 3.863629
19: J 2012 5.109533 9.590533
20: J 2013 2.324023 3.084632
Upvotes: 0