Reputation: 13
Say I have a data matrix that looks like this:
X1 X2 X3
1 1 869 1956.78
2 1 869 2006.52
3 1 869 1592.80
4 1 869 6620.00
5 1 869 228.22
6 1 869 110.15
7 1 869 1350.79
8 1 869 78.36
9 1 869 118.61
10 1 869 453.99
11 1 869 5393.16
12 1 869 1641.34
13 1 869 885.80
14 1 869 352.80
15 1 869 105.64
16 2 1 0.00
17 2 313 0.00
18 2 467 0.00
19 2 495 0.00
20 2 135 0.00
21 2 769 0.00
22 2 770 0.00
23 2 771 0.00
24 1 869 375.60
25 2 869 0.00
26 2 869 0.00
27 2 869 0.00
28 2 1 0.00
29 2 1 0.00
30 2 467 0.00
I want to format it as so:
X1 X2 X3
1 1 869 22894.96
2 2 1 0.00
3 2 313 0.00
4 2 467 0.00
5 2 495 0.00
6 2 135 0.00
7 2 769 0.00
8 2 770 0.00
9 2 771 0.00
10 1 869 375.60
11 2 869 0.00
12 2 1 0.00
13 2 467 0.00
So basically, I want to remove consecutive repetitions in X2
and X1
and then sum the associated values in X3
.
The seems like a very complex idea to me, perhaps there is an elegant solution. I know that if X3 is not zero, then the corresponding value in X1 is "1". So I can determine where this happens by using placement:
placement <- grep(1, df$X1)
I'm not sure how to continue from here.
Upvotes: 1
Views: 58
Reputation: 1
Call the dataset t.
X1 X2 X3
1 1 869 1956.78
2 1 869 2006.52
3 1 869 1592.80
4 1 869 6620.00
5 1 869 228.22
6 1 869 110.15
7 1 869 1350.79
8 1 869 78.36
9 1 869 118.61
10 1 869 453.99
11 1 869 5393.16
12 1 869 1641.34
13 1 869 885.80
14 1 869 352.80
15 1 869 105.64
16 2 1 0.00
17 2 313 0.00
18 2 467 0.00
19 2 495 0.00
20 2 135 0.00
21 2 769 0.00
22 2 770 0.00
23 2 771 0.00
24 1 869 375.60
25 2 869 0.00
26 2 869 0.00
27 2 869 0.00
28 2 1 0.00
29 2 1 0.00
30 2 467 0.00
So here's the code
s <- aggregate(X3 ~ X1 + X2, data = t , sum)
Here's the output
X1 X2 X3
1 2 1 0.00
2 2 135 0.00
3 2 313 0.00
4 2 467 0.00
5 2 495 0.00
6 2 769 0.00
7 2 770 0.00
8 2 771 0.00
9 1 869 23270.56
10 2 869 0.00
Upvotes: -1
Reputation: 38520
Here is a solution with data.table
. Assuming the data.frame is named df,
library(data.table)
setDT(df)
df[, temp:=rleid(X1, X2)][, .(X3=sum(X3)), by=.(X1, X2, temp)][, temp := NULL][]
X1 X2 X3
1: 1 869 22894.96
2: 2 1 0.00
3: 2 313 0.00
4: 2 467 0.00
5: 2 495 0.00
6: 2 135 0.00
7: 2 769 0.00
8: 2 770 0.00
9: 2 771 0.00
10: 1 869 375.60
11: 2 869 0.00
12: 2 1 0.00
13: 2 467 0.00
The multiple use of []
, called chaining, allows for multiple function calls in a single line of code. Further it allows you to manipulate the data.table and then call functions on this manipulated value.
temp:=rleid(X1, X2)
creates a temporary variable that creates an ID for X2 and X3 combinations that allows for repeated appearances of non-adjacent values..(X3=sum(X3)), by=.(X1, X2, temp)
sums X3 by the three variables.temp := NULL
removes the temporary variable[]
at the end prints out the result.Upvotes: 4
Reputation: 88
A short solution using aggregate()
ag <- aggregate(. ~ X2, data = df, FUN = sum)
Upvotes: -1