Reputation: 15
This is my problem I have a dataframe like this
df=data.frame(v1=c(24,15, 0, 7,36,10), c1=c(22,15,0,0,28,11), v2=c(0,10,0,19,0,0), c2=c(0,7,0,22,0,0), v3=c(54,22,28,55,62,38), c3=c(44,23,22,66,71,44), cut=c(20,35,40,15,40,30))
That looks like this
v1 c1 v2 c2 v3 c3 cut
1 24 22 0 0 54 44 20
2 15 15 10 7 22 23 35
3 0 0 0 0 28 22 40
4 7 0 19 22 55 66 15
5 36 28 0 0 62 71 40
6 10 11 0 0 38 44 30
I need to create a new column with adds the values of columns v1,v2,v3 (there are more in the real case), but only when they are above the value in the column "cut"
So, in this case the column "sum" would be:
sum
78
0
0
64
62
38
I hope this is not obvious.
Upvotes: 0
Views: 7070
Reputation: 6535
R> df$sum = apply(df[c(grep("v", names(df), value = TRUE), "cut")], 1, function(y) sum(y[y > y["cut"]]) )
R> df
v1 c1 v2 c2 v3 c3 cut sum
1 24 22 0 0 54 44 20 78
2 15 15 10 7 22 23 35 0
3 0 0 0 0 28 22 40 0
4 7 0 19 22 55 66 15 74
5 36 28 0 0 62 71 40 62
6 10 11 0 0 38 44 30 38
Here's a mapply approach
R> df$sum = do.call(mapply, c(function(...) { y = c(...); sum(y[y > y["cut"]]) }, df[c(grep("v", names(df), value = TRUE), "cut")]) )
R> df
v1 c1 v2 c2 v3 c3 cut sum
1 24 22 0 0 54 44 20 78
2 15 15 10 7 22 23 35 0
3 0 0 0 0 28 22 40 0
4 7 0 19 22 55 66 15 74
5 36 28 0 0 62 71 40 62
6 10 11 0 0 38 44 30 38
Upvotes: 2