Reputation: 640
I'm relatively new to R. I have a data frame, in which I would like to create a variable that does conditional formatting. Below is a sample of the data that I am working with.
cycle <- c("M", "O", "O", "O", "O", "M", "O")
irm <- c("200901", "200902", "200903", "200904", "200905", "200906", "200907")
itemcode <- c("611420B004A01", "611420B004A01", "611420B004A01", "611420B004A01", "611420B004A01", "611420B004A01", "611420B004A01")
price <- c(19.00, NA, NA, NA, NA, NA, NA)
dlq <- c(0, 0, 0, 0, 0, 1, 1)
df <- data.frame(itemcode, irm, price, cycle, dlq)
The dlq variable is conditional on the value of the cycle variable. I would like to define it such that, for every unique value of itemcode (I have 75,000 of them)
a. dlq = 1 if price = NA during a month where cycle is equal to M or
b. dlq = 1 if cycle is equal to O & price = NA and the most recent month where cycle = M price = NA.
c. dlq = 0 otherwise.
For example, dlq = 1 where irm = 200907 because cycle = M in 200906 and price was equal to NA and it's also empty where irm = 200907. I've tried using lead and lag variables, but the number of months between an M and an O is not constant. So, I want dlq = 1 if and only if cycle = "O" and price = NA in the last month where cycle = M. Is there a way to do this with ifelse or some other conditions? Any advice/help would be much appreciated. Thanks so much.
Upvotes: 1
Views: 782
Reputation: 506
> library('plyr');library('dplyr')
> df %>% tbl_df %>% mutate(dlq=ifelse((cycle=='M' & is.na(price))|((cycle=='O' & is.na(price)) & (cycle[nrow(.)]=='M' & is.na(price[nrow(.)]))),1,0))
Source: local data frame [7 x 5]
itemcode irm price cycle dlq
(fctr) (fctr) (dbl) (fctr) (dbl)
1 611420B004A01 200901 19 M 0
2 611420B004A01 200902 NA O 0
3 611420B004A01 200903 NA O 0
4 611420B004A01 200904 NA O 0
5 611420B004A01 200905 NA O 0
6 611420B004A01 200906 NA M 1
7 611420B004A01 200907 NA O 0
Wonder if I misunderstand?
Upvotes: 1
Reputation: 263301
> df$dlq[ is.na(df$price)&df$cycle=="M" ] <- 1
> df$dlq[ is.na(df$price) & df$cycle=="O" &
is.na( c(NA, head(df$price,-1))) & # The last two conditions use shifted values
c(FALSE, head(df$cycle,-1)=="M") ] <- 1
> df
cycle irm itemcode price dlq
1 M 200901 611420B004A01 19 0
2 O 200902 611420B004A01 NA 0
3 O 200903 611420B004A01 NA 0
4 O 200904 611420B004A01 NA 0
5 O 200905 611420B004A01 NA 0
6 M 200906 611420B004A01 NA 1
7 O 200907 611420B004A01 NA 1
Upvotes: 1