Charlotte Jelleyman
Charlotte Jelleyman

Reputation: 97

How to calculate row sums or counts on selected columns with condition using tidyverse?

I have the following data frame (which is a subset of a larger data frame with >3000 obs with 2 different levels of year):

rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018", 
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("15", "18"), class = "factor"), 
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3), 
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1), 
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3), 
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3), 
freqrain = c(1, 3, 2, 3, 1, 3))

I would like to count the values in cols c(3:11) that satisfy a condition. I have been trying rowSums because when I do not have the id or grouping variable, year, rowSums actually gives me counts like so:

rp.pptn.no.id <- rp.pptn %>%
   select(c(3:11)) %>%
   mutate(pptnlow = rowSums(pptnrp == 1 | pptnrp == 2 | pptnrp == 6))

I have also been able to calculate rowSums for select columns as follows:

rp.pptn <- rp.pptn %>% 
   mutate(pptnlow = rowSums(.[c(3:11)]))

However, given that I need the id and year for subsequent analysis, I would like to do both these steps in one go. I am interested as to why, given that my data are numeric, rowSums in the first instance gives me counts rather than sums. I would actually like the counts i.e. how many columns meet my criteria?

Searching has made me think something based on this could work:

rp.pptn <- rp.pptn %>% 
  mutate(pptnlow = rowSums(. [3:11]) %in% c(1, 2, 6))

This returns a logical vector = FALSE, presumably because something about my condition is not met. I don't think I'm missing much but ultimately what I would like is the below df:

rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018", 
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), 
.Label = c("15", "18"), class = "factor"), 
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3), 
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1), 
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3), 
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3), 
freqrain = c(1, 3, 2, 3, 1, 3), pptnlow = c(7, 6, 8, 4, 5, 2))

As mentioned, my actual data set is much bigger so the more automation the better! Thank you.

Upvotes: 2

Views: 369

Answers (2)

www
www

Reputation: 39154

We can use mutate_at to replace the value based on the condition (1, 2, 6) with TRUE or FALSE, use rowSums, and then bind to the original data frame.

library(dplyr)

rp.pptn2 <- rp.pptn %>%
  mutate_at(vars(3:11), funs(. %in% c(1, 2, 6))) %>%
  transmute(pptnlow = rowSums(.[, 3:11])) %>%
  bind_cols(rp.pptn, .)

Upvotes: 2

akrun
akrun

Reputation: 886948

One option would be reduce with map

library(tidyverse)
map(c(1, 2, 6), ~ rp.pptn %>% 
                   transmute_at(3:11, funs(. == .x)) %>% 
                   reduce(`+`)) %>% 
                   reduce(`+`) %>%
     mutate(rp.pptn, pptnlow = .)

Or with rowSums and map

map(c(1, 2, 6), ~ 
        rp.pptn %>% 
          select(3:11) %>% 
          transmute(pptnlow = rowSums(. == .x)))  %>% 
      bind_cols %>% 
      rowSums %>% 
      mutate(rp.pptn, pptnlow = .)

Upvotes: 2

Related Questions