user46226
user46226

Reputation: 101

Expand year range in R

I'm currently using the Ethnic Power Relations 2014 data set. Here's a small snippet of the data that I'm trying to manipulate:

     statename       from   to    gwgroupid size
[,1] United States   1966   2008  201000    0.691
[,2] United States   1966   2008  201000    0.125
[,3] United States   1966   2008  203000    0.124

where from and to are the first and last year of the observation, and gwgroupid is a marker for a particular ethnic group in a particular country.

I'd like to expand the data set so that it records an observation for every year in the range delineated by from and to, and then deletes from and to. The first three rows of the expanded data set would look like:

     statename       year    gwgroupid size
[,1] United States   1966    201000    0.691
[,2] United States   1967    201000    0.691
[,3] United States   1968    201000    0.691

How can I do this given that each country has a different range of years?

Upvotes: 0

Views: 867

Answers (2)

Marat Talipov
Marat Talipov

Reputation: 13304

You can use the unnest function from the tidyr package:

library(tidyr)

df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)

df %>% 
  unnest(year) %>% 
  select(-from,-to)

#       statename gwgroupid  size year
#1   UnitedStates    201000 0.691 1966
#2   UnitedStates    201000 0.691 1967
#3   UnitedStates    201000 0.691 1968

[Update] Alternatively, you can use the data.table package:

library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]

Upvotes: 1

jalapic
jalapic

Reputation: 14192

This does it... there may be a cleaner, quicker way-

your data:

df<-
read.table(text="
statename       from   to    gwgroupid size
UnitedStates   1966   2008  201000    0.691
UnitedStates   1966   2008  202000    0.125
UnitedStates   1966   2008  203000    0.124", header=T)


library(dplyr)
df$freq <- df$to - df$from
    df.expanded <- df[rep(row.names(df), df$freq), 1:5]


    df.expanded %>% 
      group_by(statename) %>%
      mutate(year = from + row_number(from)) %>% 
     select(statename, year, gwgroupid, size) 

to get:

      statename year gwgroupid  size
1  UnitedStates 1967    201000 0.691
2  UnitedStates 1968    201000 0.691
3  UnitedStates 1969    201000 0.691
4  UnitedStates 1970    201000 0.691
5  UnitedStates 1971    201000 0.691
6  UnitedStates 1972    201000 0.691
7  UnitedStates 1973    201000 0.691
8  UnitedStates 1974    201000 0.691
9  UnitedStates 1975    201000 0.691
10 UnitedStates 1976    201000 0.691
..          ...  ...       ...   ...

edit: just noticed that your results require 'gwgroupid' to increase on rows 1-3 but the size stays the same.... is your desired result correct?

Upvotes: 0

Related Questions