Honorata Bogusz
Honorata Bogusz

Reputation: 23

Rearranging data frame in R for panel analysis

I have problems with rearranging my data rame so that it is suitable for panel analysis. The raw data looks like this (there are all countries and 50 years, that's just head):

head(suicide_data_panel)

country     variable  1970  1971 
Afghanistan suicide   NA    NA          
Afghanistan unempl    NA    NA          
Afghanistan hci       NA    NA      
Afghanistan gini      NA    NA          
Afghanistan inflation NA    NA          
Afghanistan cpi       NA    NA          

I would like it to be:

country     year    suicide  unempl 
Afghanistan 1970      NA    NA          
Afghanistan 1971      NA    NA          
Afghanistan 1972      NA    NA      
Afghanistan 1973      NA    NA          
Afghanistan 1974      NA    NA          
Afghanistan 1975      NA    NA

So that I can run panel regression. I've tried to use dcast but I don't know how to make it account for different years:

suicide <- dcast(suicide_data_panel, country~variable, sum)

This command will result in taking the last year only into account:

head(suicide)

    country         account    alcohol     
1   Afghanistan     -18.874843  NA  
2   Albania         -6.689212   NA  
3   Algeria         NA          NA  
4   American Samoa  NA          NA      
5   Andorra         NA          NA      
6   Angola          7.000035    NA

It sorts variables alphabetically. Please help.

Upvotes: 0

Views: 99

Answers (3)

Peter Alexander
Peter Alexander

Reputation: 91

You can do this by first: using MELT function with your ID variables "country" and "variable"; and, second: using dcast function to transform "variable" into individual columns.

Upvotes: 1

jay.sf
jay.sf

Reputation: 73262

Following a reshape approach.

tm <- by(dat, dat$variable, reshape, varying=3:4, idvar="country", 
         direction="long", sep="", timevar="year", drop=2)
res <- cbind(el(tm)[1:2], data.frame(mapply(`[[`, tm, 3)))
res
#         country year         hci    suicide     unempl
# DE.1970      DE 1970  1.51152200  1.3709584  0.6328626
# AT.1970      AT 1970 -0.09465904 -0.5646982  0.4042683
# CH.1970      CH 1970  2.01842371  0.3631284 -0.1061245
# DE.1971      DE 1971  0.63595040 -0.0627141 -1.3888607
# AT.1971      AT 1971 -0.28425292  1.3048697 -0.2787888
# CH.1971      CH 1971 -2.65645542  2.2866454 -0.1333213

Data

set.seed(42)
dat <- cbind(expand.grid(country=c("DE", "AT", "CH"),
                         variable=c("suicide", "unempl", "hci"),
             stringsAsFactors=F), x1970=rnorm(9), x1971=rnorm(9))

Upvotes: 0

Cettt
Cettt

Reputation: 11981

You coul try to use the tidyverse package:

library(tidyverse)

suicide_data_panel %>%
  gather(year, dummy, -country, -variable) %>%
  spread(variable, dummy)

Upvotes: 1

Related Questions