Nick Knauer
Nick Knauer

Reputation: 4243

Calculate Sums from 1 Column Up to Value in Another, then Total Sum

I have a dataset that looks like this:

User<- c("User1", "User1","User1", "User1","User1", "User1","User1", "User2","User2","User2","User2","User2","User2","User2")
Touchpoints <- c("A", "B", "C", "F", "D", "E", "H","A", "B", "K", "D", "E", "F", "M")
Conversion <- c(0,0,0,1,0,0,1,0,0,1,1,0,0,1)
Frequency<-c(1,2,3,0,4,5,0,1,2,0,0,3,4,5)
df<-data.frame(User, Touchpoints, Conversion, Frequency)
df$Exponential<-ifelse(df$Frequency>0, exp(df$Frequency), 0)

df
    User Touchpoints Conversion Frequency Exponential
1  User1           A          0         1    2.718282
2  User1           B          0         2    7.389056
3  User1           C          0         3   20.085537
4  User1           F          1         0    0.000000
5  User1           D          0         4   54.598150
6  User1           E          0         5  148.413159
7  User1           H          1         0    0.000000
8  User2           A          0         1    2.718282
9  User2           B          0         2    7.389056
10 User2           K          1         0    0.000000
11 User2           D          1         0    0.000000
12 User2           E          0         3   20.085537
13 User2           F          0         4   54.598150
14 User2           M          1         5  148.413159

Here is what I am trying to do:

I want to take the percentage of the Exponential value represented by _Conv from the sum of the Exponential Column by User up to that Conversion value. Here is an example:

    User Touchpoints Conversion Frequency Exponential   Sum of Exp    1st_Conv   Sum_Exp_for_Conv2     2nd_Conv
1  User1           A          0         1    2.718282       30.192      0.0900             233.204       0.0116
2  User1           B          0         2    7.389056       30.192      0.2447             233.204       0.0317
3  User1           C          0         3   20.085537       30.192      0.6652             233.204       0.0861
4  User1           F          1         0    0.000000            0      0.0000             233.204            0     
5  User1           D          0         4   54.598150            0      0.0000             233.204       0.2341
6  User1           E          0         5  148.413159            0      0.0000             233.204       0.6364
7  User1           H          1         0    0.000000            0      0.0000                   0            0
8  User2           A          0         1    2.718282       10.107      0.2689              10.107       0.2689
9  User2           B          0         2    7.389056       10.107      0.7311              10.107       0.7311
10 User2           K          1         0    0.000000            0      0.0000                   0            0
11 User2           D          1         0    0.000000            0      0.0000                   0            0
12 User2           E          0         3   20.085537            0      0.0000                   0            0
13 User2           F          0         4   54.598150            0      0.0000                   0            0
14 User2           M          0         5  148.413159            0      0.0000                   0            0

There will be cases where there will be over 100 conversions per user and doing it this way would create thousands of columns, seems like it is not scalable.

My ultimate output is to sum all of the _Conv into one final column called Final_Conv. For this example, the final output would look like this:

        User Touchpoints Conversion Frequency   Final_Conv
    1  User1           A          0         1       0.1017      
    2  User1           B          0         2       0.2764
    3  User1           C          0         3       0.7514
    4  User1           F          1         0            0
    5  User1           D          0         4       0.2341
    6  User1           E          0         5       0.6364
    7  User1           H          1         0            0
    8  User2           A          0         1       0.5379
    9  User2           B          0         2       1.4621
    10 User2           K          1         0            0
    11 User2           D          1         0            0
    12 User2           E          0         3            0
    13 User2           F          0         4            0
    14 User2           M          0         5            0

Any help would be great, thanks!

Upvotes: 1

Views: 56

Answers (1)

acylam
acylam

Reputation: 18661

Probably not the simplest code, but we can do something like the following:

library(dplyr)
library(tidyr)

df %>%
  group_by(User) %>%
  mutate(row_id = row_number(),
         conv_id = cumsum(Conversion),
         exp_cumsum = cumsum(Exponential)) %>%
  group_by(conv_id, add = TRUE) %>%
  mutate(sum_of_exp = ifelse(n()==1, NA, last(exp_cumsum))) %>%
  spread(conv_id, sum_of_exp, sep = "_") %>%
  arrange(User, row_id) %>%
  fill(!!!vars(starts_with("conv_id")), .direction = "up") %>%
  mutate_at(vars(starts_with("conv_id")), funs(Exponential/.)) %>%
  ungroup() %>%
  mutate(Final_Conv = rowSums(.[-(1:7)], na.rm = TRUE)) %>%
  select(1:4, Final_Conv)

Notes:

I first created a cumulative sum of Conversion and Exponential, added conv_id as an additional grouping variable, and replaced all values in each User + conv_id combination with the last value of exp_cumsum. Then, spread the conv_id and sum_of_exp columns and filled each conv_id_ column up. Finally, used mutate_at to divide Exponential by each of the conv_id_ columns and created Final_Conv by summing all resultant conv_id_ columns with rowSums.

This solution should work for an arbitrary number of Conversions for each User.

Result:

# A tibble: 14 x 5
     User Touchpoints Conversion Frequency Final_Conv
   <fctr>      <fctr>      <dbl>     <dbl>      <dbl>
 1  User1           A          0         1  0.1016868
 2  User1           B          0         2  0.2764134
 3  User1           C          0         3  0.7513695
 4  User1           F          1         0  0.0000000
 5  User1           D          0         4  0.2341217
 6  User1           E          0         5  0.6364086
 7  User1           H          1         0  0.0000000
 8  User2           A          0         1  0.3010000
 9  User2           B          0         2  0.8182029
10  User2           K          1         0  0.0000000
11  User2           D          1         0  0.0000000
12  User2           E          0         3  0.2368828
13  User2           F          0         4  0.6439143
14  User2           M          1         5  0.0000000

Upvotes: 1

Related Questions