ds_student
ds_student

Reputation: 183

R Find minimum Date Value for Dataframe column

I have the following dataframe:

df <- data.frame(
                "DateValue" = c("2016-07-01", "2016-07-02", "2016-07-03", 
                "2016-07-04","2016-07-05", "2016-07-06","2016-07-07"),  
                "Age1" = c(rep(NA, 2), seq(14,18,2), NA, NA), 
                "Age2" = c(rep(NA, 1), seq(18,28,2)),  
                "Wages"= c(1000,1200,2100,5000,3000,2220,300)
)

I am trying to find the min dates for all columns beginning with "Age". Below is a representative output:

 output <- ['2016-07-03', '2016-07-02']

I reviewed all the SO solutions that provide the max and min dates for a Dataframe with multiple IDs using the dplyr package but just cannot figure out how to apply the same to my situation. Below is a SO post that is related. All solutions for similar problems use the same package:

Retain only the most recent date for every ID in a data frame

Upvotes: 0

Views: 1385

Answers (1)

Dave2e
Dave2e

Reputation: 24079

See comments for details:

df <- data.frame(
  "DateValue" = c("2016-07-01", "2016-07-02", "2016-07-03", 
                  "2016-07-04","2016-07-05", "2016-07-06","2016-07-07"),  
  "Age1" = c(rep(NA, 2), seq(14,18,2), NA, NA), 
  "Age2" = c(rep(NA, 1), seq(18,28,2)),  
  "Wages"= c(1000,1200,2100,5000,3000,2220,300)
)

#convert to date
df$DateValue<-as.Date(df$DateValue)
library(dplyr)
#select the proper columns, filter columns and return min date
answer<-sapply(df %>% select(starts_with("Age")),  function(x){ min(df$DateValue[!is.na(x)])})
#convert from serial number back to a date
answer<-as.Date(answer, "1970-01-01")
answer

Age1         Age2 
"2016-07-03" "2016-07-02" 

Upvotes: 2

Related Questions