sum by group, but exclude values below designated id

Question

I'm trying to go from this:

#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen") 
value = c(2,10,"end",45,"end",13,6,"end",13) 
df = data.frame(name,value)

or this (which uses NA as the cutoff)

#Starting df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen") 
value = c(2,10,NA,45,NA,13,6,NA,13) 
starting_df = data.frame(name,value)

To this:

#Ending df
name = c("tom", "tom", "tom", "chris", "chris", "chris", "jen", "jen", "jen") 
value = c(12,12,12,45,45,45,6,6,6) 
ending_df = data.frame(name,value)

The idea here is to sum by group(name in this case), which I can easily do using group_by function from dplyr, but I need to remove all values beneath NAs or text that states end. The values beneath these cutoffs cannot be included in my sum. I've been working on this all night and am out of ideas. Thanks for your help.

akrun · Accepted Answer

After grouping by 'name', convert the 'value' to numeric (in case if the column is character, if it is factor, use as.numeric(as.character(value))) so that the non-numeric elements get converted to NA, then get the index of the first NA element with which and is.na, and find the sum of 'value' until that row

starting_df %>%
    group_by(name) %>%
    mutate(value = as.numeric(value), 
           value = sum(value[seq_len(which(is.na(value))[1])], na.rm = TRUE))
# A tibble: 9 x 2
# Groups:   name [3]
#    name value
#   
#1    tom    12
#2    tom    12
#3    tom    12
#4  chris    45
#5  chris    45
#6  chris    45
#7    jen     6
#8    jen     6
#9    jen     6

sum by group, but exclude values below designated id

Answers (2)

Related Questions