Reputation: 183
I have the following dataframe:
df <- data.frame(
"DateValue" = c("2016-07-01", "2016-07-02", "2016-07-03",
"2016-07-04","2016-07-05", "2016-07-06","2016-07-07"),
"Age1" = c(rep(NA, 2), seq(14,18,2), NA, NA),
"Age2" = c(rep(NA, 1), seq(18,28,2)),
"Wages"= c(1000,1200,2100,5000,3000,2220,300)
)
I am trying to find the min dates for all columns beginning with "Age". Below is a representative output:
output <- ['2016-07-03', '2016-07-02']
I reviewed all the SO solutions that provide the max and min dates for a Dataframe with multiple IDs using the dplyr package but just cannot figure out how to apply the same to my situation. Below is a SO post that is related. All solutions for similar problems use the same package:
Retain only the most recent date for every ID in a data frame
Upvotes: 0
Views: 1385
Reputation: 24079
See comments for details:
df <- data.frame(
"DateValue" = c("2016-07-01", "2016-07-02", "2016-07-03",
"2016-07-04","2016-07-05", "2016-07-06","2016-07-07"),
"Age1" = c(rep(NA, 2), seq(14,18,2), NA, NA),
"Age2" = c(rep(NA, 1), seq(18,28,2)),
"Wages"= c(1000,1200,2100,5000,3000,2220,300)
)
#convert to date
df$DateValue<-as.Date(df$DateValue)
library(dplyr)
#select the proper columns, filter columns and return min date
answer<-sapply(df %>% select(starts_with("Age")), function(x){ min(df$DateValue[!is.na(x)])})
#convert from serial number back to a date
answer<-as.Date(answer, "1970-01-01")
answer
Age1 Age2
"2016-07-03" "2016-07-02"
Upvotes: 2