Reputation: 23
I have problems with rearranging my data rame so that it is suitable for panel analysis. The raw data looks like this (there are all countries and 50 years, that's just head):
head(suicide_data_panel)
country variable 1970 1971
Afghanistan suicide NA NA
Afghanistan unempl NA NA
Afghanistan hci NA NA
Afghanistan gini NA NA
Afghanistan inflation NA NA
Afghanistan cpi NA NA
I would like it to be:
country year suicide unempl
Afghanistan 1970 NA NA
Afghanistan 1971 NA NA
Afghanistan 1972 NA NA
Afghanistan 1973 NA NA
Afghanistan 1974 NA NA
Afghanistan 1975 NA NA
So that I can run panel regression. I've tried to use dcast but I don't know how to make it account for different years:
suicide <- dcast(suicide_data_panel, country~variable, sum)
This command will result in taking the last year only into account:
head(suicide)
country account alcohol
1 Afghanistan -18.874843 NA
2 Albania -6.689212 NA
3 Algeria NA NA
4 American Samoa NA NA
5 Andorra NA NA
6 Angola 7.000035 NA
It sorts variables alphabetically. Please help.
Upvotes: 0
Views: 99
Reputation: 91
You can do this by first: using MELT function with your ID variables "country" and "variable"; and, second: using dcast function to transform "variable" into individual columns.
Upvotes: 1
Reputation: 73262
Following a reshape
approach.
tm <- by(dat, dat$variable, reshape, varying=3:4, idvar="country",
direction="long", sep="", timevar="year", drop=2)
res <- cbind(el(tm)[1:2], data.frame(mapply(`[[`, tm, 3)))
res
# country year hci suicide unempl
# DE.1970 DE 1970 1.51152200 1.3709584 0.6328626
# AT.1970 AT 1970 -0.09465904 -0.5646982 0.4042683
# CH.1970 CH 1970 2.01842371 0.3631284 -0.1061245
# DE.1971 DE 1971 0.63595040 -0.0627141 -1.3888607
# AT.1971 AT 1971 -0.28425292 1.3048697 -0.2787888
# CH.1971 CH 1971 -2.65645542 2.2866454 -0.1333213
Data
set.seed(42)
dat <- cbind(expand.grid(country=c("DE", "AT", "CH"),
variable=c("suicide", "unempl", "hci"),
stringsAsFactors=F), x1970=rnorm(9), x1971=rnorm(9))
Upvotes: 0
Reputation: 11981
You coul try to use the tidyverse
package:
library(tidyverse)
suicide_data_panel %>%
gather(year, dummy, -country, -variable) %>%
spread(variable, dummy)
Upvotes: 1