purple_plop
purple_plop

Reputation: 280

Summarize and NAs

My code has broken and it seems it's due to NAs and summarize. I did a join on two data frames and due to differences in time, there are NAs that will show up.

My join:

data <- dplyr::right_join(ny.t, c.p, by=c("Date", "State"))

My code:

top.5 <- data %>% group_by(State) %>% summarize(Infected = max(Deaths) + max(Positive)) %>%
arrange(desc(Infected)) %>% top_n(5) 

How to fix?

Upvotes: 1

Views: 240

Answers (1)

akrun
akrun

Reputation: 887058

We could create a condition so that if all the values are NA in deaths return 0 or else return the max value

library(dplyr)
data %>%
    group_by(state) %>% 
    summarise(max_deaths = if(all(is.na(deaths))) 0 else max(deaths, na.rm = TRUE),
              max_positive = if(all(is.na(positive))) 0 else max(positive, na.rm = TRUE),
             max_negative =  if(all(is.na(negative))) 0 else max(positive, na.rm = TRUE))

Or use summarise_at

data %>%
    group_by(state) %>%
    summarise_at(vars(deaths, positive, negative),
       ~ if(all(is.na(.))) 0 else max(., na.rm = TRUE))

Upvotes: 1

Related Questions