Parseltongue
Parseltongue

Reputation: 11667

R: Best way to create a new dummy variable based on numerous cutoff conditions?

What is the easiest way to create a dummy variable given a number of conditions.

For example, let's say I have the following dataframe (data):

birth    ID
1958     176
1958     178
1959     300
1959     301
1960     500
1960     600
1961     216
1961     201
1962     100

I want to create a new variable, eligible, that is 1 IF any one of the following conditions is satisfied:

Birth year is 1958 and ID is greater than 175; Birth year is 1959 and ID is greater than 320, Birth year is 1960 and ID is greater than 341... and so on.

I know I can do this with numerous ifelse commands, but I was hoping there was a more parsimonious way of doing this.

Data

data <- structure(list(birth = c(1958L, 1958L, 1959L, 1959L, 1960L, 1960L, 1961L, 1961L, 1962L),
                       ID = c(176L, 178L, 300L, 301L, 500L, 600L, 216L, 201L, 100L)),
                  .Names = c("birth", "ID"), class = "data.frame", row.names = c(NA, -9L))

Upvotes: 3

Views: 556

Answers (4)

eipi10
eipi10

Reputation: 93851

You can use paste to create a string with the logical conditions. The string then needs to be evaluated for use in ifelse.

cond = paste("(df$birth >", c(1958:1960),"& df$ID >", c(175, 320, 341), ")", collapse=" | ")

ifelse(eval(parse(text=cond)), 1,0)

Upvotes: 2

rawr
rawr

Reputation: 20811

Yet another way

data <- structure(list(birth = c(1958L, 1958L, 1959L, 1959L, 1960L, 1960L, 1961L, 1961L, 1962L),
                       ID = c(176L, 178L, 300L, 301L, 500L, 600L, 216L, 201L, 100L)),
                  .Names = c("birth", "ID"), class = "data.frame", row.names = c(NA, -9L))

Say you have a vector of years matched 1-1 with ID cut points, eg

year <- data$birth
id <- c(175, 320, 341, seq(360, 1000, length.out = 6))

cbind(year, id)
#      year   id
# [1,] 1958  175
# [2,] 1958  320
# [3,] 1959  341
# [4,] 1959  360
# [5,] 1960  488
# [6,] 1960  616
# [7,] 1961  744
# [8,] 1961  872
# [9,] 1962 1000

Use match

within(data, idx <- +(ID[match(birth, year)] >= id))

#   birth  ID idx
# 1  1958 176   1
# 2  1958 178   0
# 3  1959 300   0
# 4  1959 301   0
# 5  1960 500   1
# 6  1960 600   0
# 7  1961 216   0
# 8  1961 201   0
# 9  1962 100   0

Upvotes: 6

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

Need to twickle the name but this approach goes for a join:

library(data.table)

lookupDF = data.table(birth=c(1958,1959,1960), ID=c(175,320,341))

lookupDF[setDT(data), on='birth'][,ID:=+(ID>=i.ID)]
#   birth ID i.ID
#1:  1958  0  176
#2:  1958  0  178
#3:  1959  1  300
#4:  1959  1  301
#5:  1960  0  500
#6:  1960  0  600
#7:  1961 NA  216
#8:  1961 NA  201
#9:  1962 NA  100

Upvotes: 4

MrFlick
MrFlick

Reputation: 206253

You can use a Reduce type operation. For example

years <- 1958:1960
ids <- c(175, 320, 341)
Reduce(function(a, b) {
    a | (data$birth==b[[1]] & data$ID>b[[2]])
}, Map(list, years, ids), init=F)

Here we use Map to make year/ID pairs and then iterate over them with Reduce. Basically it's OR-ing all the conditions together. This will return TRUE for any row that matches.

Upvotes: 4

Related Questions