Sukanya Iyer
Sukanya Iyer

Reputation: 11

Is there a way to create a panel dataset using the gather function for variables w/o creating separate datasets for each variable?

I am trying to create a panel dataset for multiple variables across years. My data is in the following format in RStudio:

     2012 2012 2012      2012  2013 2013   2013 .....
Name Var1 Var2 Var3......Var10 Var1 Var2...Var10.....
X      1    4   20 .......
Y      2    7   25 ......
Z      3    9   26 ...... and so on for each variable across years

I want to create a panel dataset for these that looks like

Name  Year Var1 Var2 Var3....Var10
X     2012 
Y     2012 
Z     2012 
X     2013
Y     2013 
Z     2013

I have created the panel in the following steps:

  1. Merged top two rows so the variables read as - 2012.Var1, 2012.Var2

  2. Used the gather function to create a separate dataset for each variable as

    Name  Year       Var1
    X    2012.Var1   1
    Y    2012.Var2   2
    X    2013.Var1 and so on
    
  3. I then combined the sheets to form one dataframe in the desired format

For step 1:

colnames(df) <- paste(colnames(df), df[1, ])

For step 2:

Var1 <- df %>% gather(year, Var1, `2012.Var1`,`2012.Var2`and so on) 
Creating a Dataset for Var1
- did the same for all the variables

For step 3:

new_data = cbind(Var1,Var2, ..., Var10) # Combining Different Datasets
names(new_data)

While the code worked to create the dataframe, it is however, extremely tedious and time consuming when the variables are large in number and spread across years. I'm looking for an easier way to create the same dataframe?

Upvotes: 1

Views: 209

Answers (1)

Ashwin Malshe
Ashwin Malshe

Reputation: 141

I don’t have a solution using gather but you can achieve this in one shot by using merge.stack function from splitstackshape package. https://cran.r-project.org/web/packages/splitstackshape/splitstackshape.pdf#page18

I will add a simple example:

  1. Create a sample dataset. In this example, I am creating variable names with an "" as a separator and the years appear after "".
library(splitstackshape)

set.seed(123)
dt1 <- data.frame(id = LETTERS[1:10],
                  var1_2012 = runif(10, min = 1, max = 10),
                  var2_2012 = runif(10, min = 1, max = 10),
                  var1_2013 = runif(10, min = 1, max = 10),
                  var2_2013 = runif(10, min = 1, max = 10))

print(dt1)
   id var1_2012 var2_2012 var1_2013 var2_2013
1   A  3.588198  9.611500  9.005854  9.667218
2   B  8.094746  5.080007  7.235231  9.120691
3   C  4.680792  7.098136  6.764561  7.216348
4   D  8.947157  6.153701  9.948428  8.159207
5   E  9.464206  1.926322  6.901352  1.221523
6   F  1.410008  9.098425  7.376774  5.300164
7   G  5.752949  3.214790  5.896594  7.826136
8   H  9.031771  1.378536  6.347278  2.947671
9   I  5.962915  3.951286  3.602438  3.863629
10  J  5.109533  9.590533  2.324023  3.084632
  1. Now use merge.stack to get it in the format that you want. Here var.stubs is the vector of the variable prefixes that you want to retain in columns. sep argument specifies the separator that you want to use split the variable names.
dt2 <- splitstackshape::merged.stack(dt1,
                                     id.vars = "id",
                                     var.stubs = c("var1", "var2"),
                                     sep = "_",
                                     keep.all = TRUE)

print(dt2)

    id .time_1     var1     var2
 1:  A    2012 3.588198 9.611500
 2:  A    2013 9.005854 9.667218
 3:  B    2012 8.094746 5.080007
 4:  B    2013 7.235231 9.120691
 5:  C    2012 4.680792 7.098136
 6:  C    2013 6.764561 7.216348
 7:  D    2012 8.947157 6.153701
 8:  D    2013 9.948428 8.159207
 9:  E    2012 9.464206 1.926322
10:  E    2013 6.901352 1.221523
11:  F    2012 1.410008 9.098425
12:  F    2013 7.376774 5.300164
13:  G    2012 5.752949 3.214790
14:  G    2013 5.896594 7.826136
15:  H    2012 9.031771 1.378536
16:  H    2013 6.347278 2.947671
17:  I    2012 5.962915 3.951286
18:  I    2013 3.602438 3.863629
19:  J    2012 5.109533 9.590533
20:  J    2013 2.324023 3.084632

Upvotes: 0

Related Questions