Reputation: 712
I have a simple dataframe in R
df1 <- data.frame(
questionID = c(1,1,3,4,5,5),
userID = c(101, 101, 102, 101, 102,101),
Value=c(10,20,30,40,50,10))
The basic idea is to have a column that indicates the sum of value for a user on questions they asked before (lower number questions).
I tried using this function (after trying the pipe of sum which just gave errors about non-numeric that everybody seems to face)
f2 <- function(x){
Value_out <- filter(df1,questionID<x['questionID'] & userID == x['userID'] ) %>%
select(Value) %>%
summarize_if(is.numeric, sum, na.rm=TRUE)
}
out=mutate(df1,Expert=apply(df1, 1,f2))
While this works if you print it out, the Expert column is saved as a list of dataframes. All I want is an int or numeric of the sum of Value. Is there anyway to do this? By the way, yes, I've looked all over for ways to do this, with most answers just summarizing the column in a manner that won't work for me.
Upvotes: 0
Views: 109
Reputation: 388807
Using dplyr
and purrr::map_dbl
one approach would be to group_by
userID
and sum
Value
for each questionID
which is less than current value.
library(dplyr)
df1 %>%
group_by(userID) %>%
mutate(Expert = purrr::map_dbl(questionID, ~sum(Value[questionID < .x])))
# questionID userID Value Expert
# <dbl> <dbl> <dbl> <dbl>
#1 1 101 10 0
#2 1 101 20 0
#3 3 102 30 0
#4 4 101 40 30
#5 5 102 50 30
#6 5 101 10 70
Upvotes: 1
Reputation: 1338
I think I would avoid writing my own function altogether and use data.table
on this one. You can do what you want in just a couple lines, and I wouldn't be surprised if there was a way to golf this down to fewer lines
Given your same data, we create a data.table
object:
library(data.table)
dt <- data.table(
questionID = c(1,1,3,4,5,5),
userID = c(101, 101, 102, 101, 102,101),
Value=c(10,20,30,40,50,10))
Next, we shift
our values by 1 (lag) within each userID
:
dt[, lastVal := shift(Value, n = 1, fill = 0), by = .(userID)]
And finally, we cumsum
those by userID
, and replace those with multiple Value
s with the same userID
x questionID
with the min Expert
, which should be 0 because we used fill = 0
in shift
above before we cumsum
:
dt[,
Expert := cumsum(lastVal),
by = .(userID)][,
Expert := min(Expert),
by = .(userID, questionID)]
So, putting that all together, we have:
library(data.table)
dt <- data.table(
questionID = c(1,1,3,4,5,5),
userID = c(101, 101, 102, 101, 102,101),
Value=c(10,20,30,40,50,10))
dt[, lastVal := shift(Value, n = 1, fill = 0), by = .(userID)]
dt[,
Expert := cumsum(lastVal),
by = .(userID)][,
Expert := min(Expert),
by = .(userID, questionID)]
dt
questionID userID Value lastVal Expert
1: 1 101 10 0 0
2: 1 101 20 10 0
3: 3 102 30 0 0
4: 4 101 40 20 30
5: 5 102 50 30 30
6: 5 101 10 40 70
Upvotes: 1