Reputation: 101
I'm currently using the Ethnic Power Relations 2014 data set. Here's a small snippet of the data that I'm trying to manipulate:
statename from to gwgroupid size
[,1] United States 1966 2008 201000 0.691
[,2] United States 1966 2008 201000 0.125
[,3] United States 1966 2008 203000 0.124
where from and to are the first and last year of the observation, and gwgroupid is a marker for a particular ethnic group in a particular country.
I'd like to expand the data set so that it records an observation for every year in the range delineated by from and to, and then deletes from and to. The first three rows of the expanded data set would look like:
statename year gwgroupid size
[,1] United States 1966 201000 0.691
[,2] United States 1967 201000 0.691
[,3] United States 1968 201000 0.691
How can I do this given that each country has a different range of years?
Upvotes: 0
Views: 867
Reputation: 13304
You can use the unnest
function from the tidyr
package:
library(tidyr)
df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)
df %>%
unnest(year) %>%
select(-from,-to)
# statename gwgroupid size year
#1 UnitedStates 201000 0.691 1966
#2 UnitedStates 201000 0.691 1967
#3 UnitedStates 201000 0.691 1968
[Update] Alternatively, you can use the data.table
package:
library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]
Upvotes: 1
Reputation: 14192
This does it... there may be a cleaner, quicker way-
your data:
df<-
read.table(text="
statename from to gwgroupid size
UnitedStates 1966 2008 201000 0.691
UnitedStates 1966 2008 202000 0.125
UnitedStates 1966 2008 203000 0.124", header=T)
library(dplyr)
df$freq <- df$to - df$from
df.expanded <- df[rep(row.names(df), df$freq), 1:5]
df.expanded %>%
group_by(statename) %>%
mutate(year = from + row_number(from)) %>%
select(statename, year, gwgroupid, size)
to get:
statename year gwgroupid size
1 UnitedStates 1967 201000 0.691
2 UnitedStates 1968 201000 0.691
3 UnitedStates 1969 201000 0.691
4 UnitedStates 1970 201000 0.691
5 UnitedStates 1971 201000 0.691
6 UnitedStates 1972 201000 0.691
7 UnitedStates 1973 201000 0.691
8 UnitedStates 1974 201000 0.691
9 UnitedStates 1975 201000 0.691
10 UnitedStates 1976 201000 0.691
.. ... ... ... ...
edit: just noticed that your results require 'gwgroupid' to increase on rows 1-3 but the size stays the same.... is your desired result correct?
Upvotes: 0