Reputation: 1243
I have a data frame that contains several columns with dates
col1<-seq( as.Date("2011-07-01"), by=20, len=10)
col2<-seq( as.Date("2011-09-01"), by=7, len=10)
col3<-seq( as.Date("2011-08-01"), by=1, len=10)
data.frame(col1,col2,col3)
The data frame looks like this:
col1 col2 col3
1 2011-07-01 2011-09-01 2011-08-01
2 2011-07-21 2011-09-08 2011-08-02
3 2011-08-10 2011-09-15 2011-08-03
4 2011-08-30 2011-09-22 2011-08-04
5 2011-09-19 2011-09-29 2011-08-05
6 2011-10-09 2011-10-06 2011-08-06
7 2011-10-29 2011-10-13 2011-08-07
8 2011-11-18 2011-10-20 2011-08-08
9 2011-12-08 2011-10-27 2011-08-09
10 2011-12-28 2011-11-03 2011-08-10
I am trying to merge them into one column so that
A. Only the lowest (earliest) date remains per row and others get ignored
1 2011-07-01
2 2011-07-21
3 2011-08-03
4 2011-08-04
5 2011-08-05
6 2011-08-06
7 2011-08-07
8 2011-08-08
9 2011-08-09
10 2011-08-10
B. Only the highest (latest) date remains per row
1 2011-09-01
2 2011-09-08
3 2011-09-15
4 2011-09-22
5 2011-09-29
6 2011-10-09
7 2011-10-29
8 2011-11-18
9 2011-12-08
10 2011-12-28
The real dataset has NA
s so if NA
gets encountered it should be ignored unless all columns have a missing value for a particular row, in which case NA
will be generated there as well.
Any thoughts?
Upvotes: 1
Views: 1176
Reputation: 887118
We can use max.col
to find the index of the maximum values in each row, then cbind
with row index and get the value per each row, convert to a data.frame
j1 <- sapply(df1, as.numeric)
df2 <- data.frame(Date = df1[cbind(1:nrow(df1),max.col(j1, 'first') )])
df3 <- data.frame(Date = df1[cbind(1:nrow(df1), max.col(-1*j1, "first"))])
df2
# Date
#1 2011-09-01
#2 2011-09-08
#3 2011-09-15
#4 2011-09-22
#5 2011-09-29
#6 2011-10-09
#7 2011-10-29
#8 2011-11-18
#9 2011-12-08
#10 2011-12-28
df3
# Date
#1 2011-07-01
#2 2011-07-21
#3 2011-08-03
#4 2011-08-04
#5 2011-08-05
#6 2011-08-06
#7 2011-08-07
#8 2011-08-08
#9 2011-08-09
#10 2011-08-10
Or another option is
as.Date(apply(df1, 1, min, na.rm = TRUE))
as.Date(apply(df1, 1, max, na.rm = TRUE))
Or with tidyverse
library(tidyverse)
df1 %>%
rowwise() %>%
transmute(col1Max = max(col1, col2, col3), colMin = min(col1, col2, col3))
Upvotes: 1
Reputation: 93813
pmin
and pmax
are helpful here:
do.call(pmin, dat)
# [1] "2011-07-01" "2011-07-21" "2011-08-03" "2011-08-04" "2011-08-05"
# [6] "2011-08-06" "2011-08-07" "2011-08-08" "2011-08-09" "2011-08-10"
do.call(pmax, dat)
# [1] "2011-09-01" "2011-09-08" "2011-09-15" "2011-09-22" "2011-09-29"
# [6] "2011-10-09" "2011-10-29" "2011-11-18" "2011-12-08" "2011-12-28"
This also works for NA
values, like:
do.call(pmin, c(dat, na.rm=TRUE) )
You can also select the specific columns you want to analyse like:
do.call(pmin, c(dat[c("col1","col2","col3")], na.rm=TRUE) )
Upvotes: 7