ayeepi
ayeepi

Reputation: 195

How to create a new variable on condition of others in R

I have the following data frame:

ID   Measurement A      Measurement  B     Date of Measurements A and B   Date of Measurement C
1    23                 24                 12                             16
1    22                 23                 12                             15
1    24                 22                 12                             17
1    21                 20                 12                             11
1    27                 29                 12                             17

This is example using 1 Identifier (ID), in reality I have thousands.

I want to create a variable which encapsulates

"if this ID's Measurement A OR Measurement B is > xxx, before the date of Measurement C, ON MORE THAN TWO OCCASSIONS, then designate them a 1 in a new column called new_var".

So far, I removed all Date of Measurements A and B > Date of Measurement C

measurements <- subset(measurements, dateofmeasurementsAandB < dateofmeasurementC)

And then added in the cut offs in an ifelse statement

measurements$new_var<- ifelse(measurements$measurementA >= xxx | measurements$measurementB >= xxx, 1, 0)

But can't factor in the 'on more than one occasion bit' (as you can see from example, each ID has multiple rows/occasions)

Any help would be great, especially if it could be done simpler!

Upvotes: 2

Views: 334

Answers (1)

SirTain
SirTain

Reputation: 369

If I undestand what you're asking, I think I would use dplyr's count function:

#Starting from your dataframe
library(tidyverse)
df <- measurements %>%
         filter(dateofmeasurementsAandB < dateofmeasurementC,
                measurements$measurementA >= xxx | measurements$measurementB >= xxx)

This data frame should only have the conditions you're going for, so now we count them and filter the result:

df <- df %>% count(ID) %>% filter(n >= 2)

The vector df$ID should now only have the IDs that have been measured more than once which you can then feed back into your measurements data frame with ease, but I'm partial to this:

measurements$new_var <- 0
measurements[measurements$ID %in% df$ID]$new_var <- 1

Upvotes: 2

Related Questions