Reputation: 5049
I am trying to create a "long" data frame of indicator ("dummy") variables out of a very peculiar type of "wide" data frame in R that has interval ranges of years defining my data.
What I have looks like this:
f=data.frame(name=c("A","B","C"),
year.start=c(1990,1994,1993),year.end=c(1994,1995,1993))
name year.start year.end
1 A 1990 1994
2 B 1994 1995
3 C 1993 1993
Update: I have changed the value of year.start for A to 1990 from the initial example of 1993 to address some of the answers below which rely on unique values instead of intervals.
What I would like is a long data frame that would look like this, with an entry for each of the possible years in the original data frame, eg, 1990 through 1995 where 1 = present and 0 = absent.
name year indicator
A 1990 1
A 1991 1
A 1992 1
A 1993 1
A 1994 1
A 1995 0
B 1990 0
B 1991 0
B 1992 0
B 1993 0
B 1994 1
B 1995 1
C 1990 0
C 1991 0
C 1992 0
C 1993 1
C 1994 0
C 1995 0
Try as I might, I don't see how I can do this with Hadley Wickham's reshape2 package.
Thanks!
Upvotes: 1
Views: 363
Reputation: 1437
Another base R solution
f=data.frame(name=c("A","B","C"),
year.start=c(1993,1994,1993),year.end=c(1994,1995,1993), stringsAsFactors=F)
x <- expand.grid(unique(f$name),min(f1$year):max(f1$year))
names(x) <- c("name", "year")
x$indicator <- sapply(1:nrow(x), function(i) sum(x$name[i]==f$name & x$year[i] >= f$year.start & x$year[i] <= f$year.end))
x[order(x$name),]
Upvotes: 0
Reputation: 44614
Someone else might have suggestion for reshape2
, but here is a base R solution:
years <- factor(unlist(f[-1]), levels=seq(min(f[-1]), max(f[-1]), by=1))
result <- data.frame(table(years, rep(f[[1]], length.out=length(years))))
# years Var2 Freq
# 1 1990 A 1
# 2 1991 A 0
# 3 1992 A 0
# 4 1993 A 0
# 5 1994 A 1
# 6 1995 A 0
# 7 1990 B 0
# 8 1991 B 0
# 9 1992 B 0
# 10 1993 B 0
# 11 1994 B 1
# 12 1995 B 1
# 13 1990 C 0
# 14 1991 C 0
# 15 1992 C 0
# 16 1993 C 2
# 17 1994 C 0
# 18 1995 C 0
Upvotes: 3
Reputation: 59365
Here's another solution, similar to the ones above, which aims to be straightforward:
zz <- cbind(name=f[1],year=rep(min(f[-1]):max(f[-1]),each=nrow(f)))
zz$indicator <- as.numeric((f$name==zz$name &
f$year.start<=zz$year &
f$year.end >=zz$year))
result <- zz[order(zz$name,zz$year),]
The first line builds a template with all the names and all the years. The second line sets indicator
based on whether it is present in the range. The third line just reorders the result.
Upvotes: 1
Reputation: 55360
here is a step-by-step breakdown, using data.table
library(data.table)
f <- as.data.table(f)
## ALL OF NAME-YEAR COMBINATIONS
ALL <- f[, CJ(name=name, year=seq(min(year.start), max(year.end)))]
## WHICH COMBINATIONS EXIST
PRESENT <- f[, list(year = seq(year.start, year.end)), by=name]
## SETKEYS FOR MERGING
setkey(ALL, name, year)
setkey(PRESENT, name, year)
## INITIALIZE INDICATOR TO ZERO, THEN SET TO 1 FOR THOSE PRESENT
ALL[, indicator := 0]
ALL[PRESENT, indicator := 1]
ALL
name year indicator
1: A 1993 1
2: A 1994 1
3: A 1995 0
4: B 1993 0
5: B 1994 1
6: B 1995 1
7: C 1993 1
8: C 1994 0
9: C 1995 0
Upvotes: 2