Filtering data based on multiple variables

Question

I am trying to create a new column based other column criteria where my data looks like the following:

ID   Column 1    Column 2    Column 3 
 1     2            Y       "2013-10-22T10:09"
 1     2            Y       "2013-10-23T10:09" 
 2     3            N       "2013-10-23T10:09"
 3     0            Y       "2013-10-23T10:09"

For each ID, I would like to keep only the earliest date/time as long as column 1 is greater than 0 and column 2 is not N. The results would look like this:

 ID   Column 1    Column 2    Column 3             Column 4
  1     2            Y       "2013-10-22T10:09"    2013-10-22

I currently tried this but I was wondering how to do it and if there is an elegant way of doing it:

library(dplyr)
ifelse(Column 1 >0 and Column 2 !="N",  
(new %>%
group_by(ID) %>%
arrange(Column 3) %>%
slice(1L)))
Column 4 <- as.Date(Column 3, format='%Y-%m-%dT%H:%M')

IceCreamToucan · Accepted Answer

library(dplyr)

df %>% 
  filter(Column1 > 0 & Column2 != 'N') %>% # filter out non-matching rows
  group_by(ID) %>% 
  top_n(-1, Column3) %>% # select only the row with the earliest date-time
  mutate(Date = as.Date(Column3)) # create date column

# 
# # A tibble: 1 x 5
# # Groups:   ID [1]
#      ID Column1 Column2 Column3          Date      
#                          
# 1     1       2 Y       2013-10-22T10:09 2013-10-22

Filtering data based on multiple variables

Answers (2)

Related Questions