Reputation: 343
I'm trying to go from this:
#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen")
value = c(2,10,"end",45,"end",13,6,"end",13)
df = data.frame(name,value)
or this (which uses NA as the cutoff)
#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen")
value = c(2,10,NA,45,NA,13,6,NA,13)
starting_df = data.frame(name,value)
To this:
#Ending df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen")
value = c(12,12,12,45,45,45,6,6,6)
ending_df = data.frame(name,value)
The idea here is to sum by group(name in this case), which I can easily do using group_by
function from dplyr
, but I need to remove all values beneath NAs
or text that states end
. The values beneath these cutoffs cannot be included in my sum. I've been working on this all night and am out of ideas. Thanks for your help.
Upvotes: 4
Views: 126
Reputation: 389235
A base R option would be using ave
where for each group (name
) we find the index where the first NA
occurs using which.max
and sum values
until that index.
starting_df$value <- ave(starting_df$value, starting_df$name, FUN = function(x)
sum(x[1:which.max(is.na(x)) - 1]))
starting_df
# name value
#1 tom 12
#2 tom 12
#3 tom 12
#4 chris 45
#5 chris 45
#6 chris 45
#7 jen 6
#8 jen 6
#9 jen 6
As @thelatemail commented, we can also do this using cumsum
. We select the maximum value of cumsum
from each group ignoring the NA
s.
starting_df$value <- ave(starting_df$value, starting_df$name, FUN = function(x)
max(cumsum(x), na.rm = TRUE))
Upvotes: 2
Reputation: 887851
After grouping by 'name', convert the 'value' to numeric
(in case if the column is character
, if it is factor
, use as.numeric(as.character(value))
) so that the non-numeric elements get converted to NA
, then get the index of the first NA element with which
and is.na
, and find the sum
of 'value' until that row
starting_df %>%
group_by(name) %>%
mutate(value = as.numeric(value),
value = sum(value[seq_len(which(is.na(value))[1])], na.rm = TRUE))
# A tibble: 9 x 2
# Groups: name [3]
# name value
# <fctr> <dbl>
#1 tom 12
#2 tom 12
#3 tom 12
#4 chris 45
#5 chris 45
#6 chris 45
#7 jen 6
#8 jen 6
#9 jen 6
Upvotes: 2