Reputation: 11667
What is the easiest way to create a dummy variable given a number of conditions.
For example, let's say I have the following dataframe (data):
birth ID
1958 176
1958 178
1959 300
1959 301
1960 500
1960 600
1961 216
1961 201
1962 100
I want to create a new variable, eligible, that is 1 IF any one of the following conditions is satisfied:
Birth year is 1958 and ID is greater than 175; Birth year is 1959 and ID is greater than 320, Birth year is 1960 and ID is greater than 341... and so on.
I know I can do this with numerous ifelse commands, but I was hoping there was a more parsimonious way of doing this.
Data
data <- structure(list(birth = c(1958L, 1958L, 1959L, 1959L, 1960L, 1960L, 1961L, 1961L, 1962L),
ID = c(176L, 178L, 300L, 301L, 500L, 600L, 216L, 201L, 100L)),
.Names = c("birth", "ID"), class = "data.frame", row.names = c(NA, -9L))
Upvotes: 3
Views: 556
Reputation: 93851
You can use paste to create a string with the logical conditions. The string then needs to be evaluated for use in ifelse
.
cond = paste("(df$birth >", c(1958:1960),"& df$ID >", c(175, 320, 341), ")", collapse=" | ")
ifelse(eval(parse(text=cond)), 1,0)
Upvotes: 2
Reputation: 20811
Yet another way
data <- structure(list(birth = c(1958L, 1958L, 1959L, 1959L, 1960L, 1960L, 1961L, 1961L, 1962L),
ID = c(176L, 178L, 300L, 301L, 500L, 600L, 216L, 201L, 100L)),
.Names = c("birth", "ID"), class = "data.frame", row.names = c(NA, -9L))
Say you have a vector of years matched 1-1 with ID cut points, eg
year <- data$birth
id <- c(175, 320, 341, seq(360, 1000, length.out = 6))
cbind(year, id)
# year id
# [1,] 1958 175
# [2,] 1958 320
# [3,] 1959 341
# [4,] 1959 360
# [5,] 1960 488
# [6,] 1960 616
# [7,] 1961 744
# [8,] 1961 872
# [9,] 1962 1000
Use match
within(data, idx <- +(ID[match(birth, year)] >= id))
# birth ID idx
# 1 1958 176 1
# 2 1958 178 0
# 3 1959 300 0
# 4 1959 301 0
# 5 1960 500 1
# 6 1960 600 0
# 7 1961 216 0
# 8 1961 201 0
# 9 1962 100 0
Upvotes: 6
Reputation: 31171
Need to twickle the name but this approach goes for a join:
library(data.table)
lookupDF = data.table(birth=c(1958,1959,1960), ID=c(175,320,341))
lookupDF[setDT(data), on='birth'][,ID:=+(ID>=i.ID)]
# birth ID i.ID
#1: 1958 0 176
#2: 1958 0 178
#3: 1959 1 300
#4: 1959 1 301
#5: 1960 0 500
#6: 1960 0 600
#7: 1961 NA 216
#8: 1961 NA 201
#9: 1962 NA 100
Upvotes: 4
Reputation: 206253
You can use a Reduce type operation. For example
years <- 1958:1960
ids <- c(175, 320, 341)
Reduce(function(a, b) {
a | (data$birth==b[[1]] & data$ID>b[[2]])
}, Map(list, years, ids), init=F)
Here we use Map
to make year/ID pairs and then iterate over them with Reduce
. Basically it's OR-ing all the conditions together. This will return TRUE for any row that matches.
Upvotes: 4