Reputation: 69
been stuck on this for a while.
For every row, I'm trying to count across the columns for all values greater than 0. But the caveat is that I need to specify the starting column to start counting across using each row's specific value in another column.
For example the table would look like this:
ID | StartWeek | 1 | 2 | 3 |
123 2 3 0 1
456 1 1 0 1
Expected output would look like this:
ID | StartWeek | 1 | 2 | 3 | CountRow |
123 2 3 0 1 1
456 1 1 0 1 2
I tried something like this:
df <- df %>%
mutate(CountRow = rowSums(.[StartWeek:5] > 0))
But it just gives me the entire column instead of the individual value of each row. I think I read upon a potential solution using groupby() but would there by another way of doing this just by accessing every rows certain value instead of calling the entire column.
Upvotes: 0
Views: 1047
Reputation: 388862
For each row we can count how many values after StartWeek
are greater than 0. In dplyr
we can use rowwise
:
library(dplyr)
df %>%
rowwise() %>%
mutate(CountRow = { tmp <- c_across(`1`:`3`);
sum(tmp[StartWeek:length(tmp)] > 0)
})
# ID StartWeek `1` `2` `3` CountRow
# <int> <int> <int> <int> <int> <int>
#1 123 2 3 0 1 1
#2 456 1 1 0 1 2
Upvotes: 1
Reputation: 389
One way is to convert the dataframe to long format and filter out the unnecessary cells based on StartWeek
before counting.
library(tidyverse)
df <- tribble(
~ID, ~StartWeek, ~"1", ~"2", ~"3",
123L, 2L, 3L, 0L, 1L,
456L, 1L, 1L, 0L, 1L
)
df %>% pivot_longer(cols=-c(ID, StartWeek)) %>%
mutate(name=as.integer(name)) %>% filter(name>=StartWeek, value>0) %>%
group_by(ID) %>% summarize(CountRow=n(), .groups="drop") %>%
left_join(df, ., by="ID")
#> # A tibble: 2 x 6
#> ID StartWeek `1` `2` `3` CountRow
#> <int> <int> <int> <int> <int> <int>
#> 1 123 2 3 0 1 1
#> 2 456 1 1 0 1 2
Created on 2021-03-11 by the reprex package (v1.0.0)
Upvotes: 1