avis88
avis88

Reputation: 13

How do you select a max of one column and not NA's in another column in R?

I'm looking for a way in R where I can select the max(col1) where col2 is not NA?

Example datafame named df1

#df1
Year  col1  col2 
2016   4     NA  # has NA
2016   2     NA  # has NA
2016   1     3  # this is the max for 2016
2017   3     NA
2017   2     3   # this is the max for 2017
2017   1     3
2018   2     4   # this is the max for 2018
2018   1     NA

I would like the new dataset to only return

Year  col1  col2 
2016   1     3
2017   2     3
2018   2     4

If any one can help, it would be very appreciated?

Upvotes: 1

Views: 114

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145765

Using dplyr:

library(dplyr)
df1 %>% filter(!is.na(col2)) %>%
  group_by(year) %>%
  arrange(desc(col1)) %>%
  slice(1)

Using data.table:

library(data.table)
setDT(df1)
df1[!is.na(col2), .SD[which.max(col1)], by = Year]

This works in a fresh R session:

library(data.table)
dt = fread("Year  col1  col2 
2016   4     NA
2016   2     NA
2016   1     3
2017   3     NA
2017   2     3
2017   1     3
2018   2     4
2018   1     NA")

dt[!is.na(col2), .SD[which.max(col1)], by = Year]
#    Year col1 col2
# 1: 2016    1    3
# 2: 2017    2    3
# 3: 2018    2    4

Upvotes: 1

markus
markus

Reputation: 26343

In base R

out <- na.omit(df1)
merge(aggregate(col1 ~ Year, out, max), out) # thanks to Rui
#  Year col1 col2
#1 2016    1    3
#2 2017    2    3
#3 2018    2    4

Upvotes: 4

Related Questions