How do you select a max of one column and not NA's in another column in R?

Question

I'm looking for a way in R where I can select the max(col1) where col2 is not NA?

Example datafame named df1

#df1
Year  col1  col2 
2016   4     NA  # has NA
2016   2     NA  # has NA
2016   1     3  # this is the max for 2016
2017   3     NA
2017   2     3   # this is the max for 2017
2017   1     3
2018   2     4   # this is the max for 2018
2018   1     NA

I would like the new dataset to only return

Year  col1  col2 
2016   1     3
2017   2     3
2018   2     4

If any one can help, it would be very appreciated?

Gregor Thomas · Accepted Answer

Using dplyr:

library(dplyr)
df1 %>% filter(!is.na(col2)) %>%
  group_by(year) %>%
  arrange(desc(col1)) %>%
  slice(1)

Using data.table:

library(data.table)
setDT(df1)
df1[!is.na(col2), .SD[which.max(col1)], by = Year]

This works in a fresh R session:

library(data.table)
dt = fread("Year  col1  col2 
2016   4     NA
2016   2     NA
2016   1     3
2017   3     NA
2017   2     3
2017   1     3
2018   2     4
2018   1     NA")

dt[!is.na(col2), .SD[which.max(col1)], by = Year]
#    Year col1 col2
# 1: 2016    1    3
# 2: 2017    2    3
# 3: 2018    2    4

How do you select a max of one column and not NA's in another column in R?

Answers (2)

Related Questions

How do you select a max of one column and not NA&#39;s in another column in R?

Answers (2)

Related Questions

How do you select a max of one column and not NA's in another column in R?