Reputation: 13

R: removing repetitions from one column and sum another

Say I have a data matrix that looks like this:

   X1  X2      X3
1   1 869 1956.78
2   1 869 2006.52
3   1 869 1592.80
4   1 869 6620.00
5   1 869  228.22
6   1 869  110.15
7   1 869 1350.79
8   1 869   78.36
9   1 869  118.61
10  1 869  453.99
11  1 869 5393.16
12  1 869 1641.34
13  1 869  885.80
14  1 869  352.80
15  1 869  105.64
16  2   1    0.00
17  2 313    0.00
18  2 467    0.00
19  2 495    0.00
20  2 135    0.00
21  2 769    0.00
22  2 770    0.00
23  2 771    0.00
24  1 869  375.60
25  2 869    0.00
26  2 869    0.00
27  2 869    0.00
28  2   1    0.00
29  2   1    0.00
30  2 467    0.00

I want to format it as so:

   X1  X2       X3
1   1 869 22894.96
2   2   1    0.00
3   2 313    0.00
4   2 467    0.00
5   2 495    0.00
6   2 135    0.00
7   2 769    0.00
8   2 770    0.00
9   2 771    0.00
10  1 869  375.60
11  2 869    0.00
12  2   1    0.00
13  2 467    0.00

So basically, I want to remove consecutive repetitions in X2 and X1 and then sum the associated values in X3.

The seems like a very complex idea to me, perhaps there is an elegant solution. I know that if X3 is not zero, then the corresponding value in X1 is "1". So I can determine where this happens by using placement:

placement <- grep(1, df$X1)

I'm not sure how to continue from here.

Upvotes: 1

Answers (3)

kat479

Reputation: 1

Call the dataset t.

    X1  X2  X3
1   1   869 1956.78
2   1   869 2006.52
3   1   869 1592.80
4   1   869 6620.00
5   1   869 228.22
6   1   869 110.15
7   1   869 1350.79
8   1   869 78.36
9   1   869 118.61
10  1   869 453.99
11  1   869 5393.16
12  1   869 1641.34
13  1   869 885.80
14  1   869 352.80
15  1   869 105.64
16  2   1   0.00
17  2   313 0.00
18  2   467 0.00
19  2   495 0.00
20  2   135 0.00
21  2   769 0.00
22  2   770 0.00
23  2   771 0.00
24  1   869 375.60
25  2   869 0.00
26  2   869 0.00
27  2   869 0.00
28  2   1   0.00
29  2   1   0.00
30  2   467 0.00

So here's the code

s <- aggregate(X3 ~ X1 + X2, data = t , sum)

Here's the output

    X1  X2  X3
1   2   1   0.00
2   2   135 0.00
3   2   313 0.00
4   2   467 0.00
5   2   495 0.00
6   2   769 0.00
7   2   770 0.00
8   2   771 0.00
9   1   869 23270.56
10  2   869 0.00

Upvotes: -1

lmo

Reputation: 38520

Here is a solution with data.table. Assuming the data.frame is named df,

library(data.table)
setDT(df)

df[, temp:=rleid(X1, X2)][, .(X3=sum(X3)), by=.(X1, X2, temp)][, temp := NULL][]
    X1  X2       X3
 1:  1 869 22894.96
 2:  2   1     0.00
 3:  2 313     0.00
 4:  2 467     0.00
 5:  2 495     0.00
 6:  2 135     0.00
 7:  2 769     0.00
 8:  2 770     0.00
 9:  2 771     0.00
10:  1 869   375.60
11:  2 869     0.00
12:  2   1     0.00
13:  2 467     0.00

The multiple use of [], called chaining, allows for multiple function calls in a single line of code. Further it allows you to manipulate the data.table and then call functions on this manipulated value.

temp:=rleid(X1, X2) creates a temporary variable that creates an ID for X2 and X3 combinations that allows for repeated appearances of non-adjacent values.
.(X3=sum(X3)), by=.(X1, X2, temp) sums X3 by the three variables.
temp := NULL removes the temporary variable
the empty [] at the end prints out the result.

Upvotes: 4

user116

Reputation: 88

A short solution using aggregate()

ag <- aggregate(. ~ X2, data = df, FUN = sum)

Upvotes: -1

R: removing repetitions from one column and sum another

Answers (3)

Related Questions