user2105555
user2105555

Reputation:

merge data tables in R

My apologies for this simple question. Basically, I want to make three separate cumsum() tables and merge them together by the first table. For example:

a <- cumsum(table(df$variable))
b <- cumsum(table(df$variable[c(TRUE, FALSE)]))
c <- cumsum(table(df$variable[c(FALSE, TRUE)]))

Where a is the cumsum of the entire vector of df$variable, b is the cumsum of the odd-numbered values of df$variable, c is the cumsum of the even-numbered values of df$variable. Another way of interpreting this is that combining b and c produces a.

This is the entire vector of numbers.

  [1] 18 17 15 10  5  0 10 10  0 10 15  5  5  5 25 15 13  0  0  0 25 18 15 15  1  4  5
 [28]  5  5 15  5 12 15  0  3 12 20  0  5  5 13 10 10 10  3 15 13 20 12 60 10 10  2  0
 [55]  5 10  8  4  0 15  5  5 15  5  0  5  2  8  5  5  5  5  9  9  3  7 20 25  5  4 10
 [82] 10  2  4  5  5 18  8  0 10  5  5  7 12  5 13 26 20 13 21  5 15 10 10  5 15  5 15
[109]  0  1 13 21 25 25  5 14  5 15 10  0  5 15  3  4  5 15 15  5 25 25  5 15  0  2 13
[136] 22  2 10  3  3 15 11  0  2 40 35 24 24  5  5 10  5 16  0 17 19 20  5  5  5  0 15
[163]  3 13 20  4  5  5  3 19 25 25  0 15  5  3 22 22 25  5 15 15  5 15 17  9  5  5 15
[190] 10

For a, I used cbind(cumsum(table(df$variable)))

0     18
1     20
2     26
3     35
4     41
5     88
7     90
8     93
9     96
10   115
11   116
12   120
13   128
14   129
15   154
16   155
17   158
18   161
19   163
20   169
21   171
22   174
24   176
25   186
26   187
35   188
40   189
60   190

For b, I used cbind(cumsum(table(df$variable[c(TRUE, FALSE)])))

0    10
1    11
2    15
3    22
5    50
7    51
8    52
9    53
10   60
12   61
13   67
15   76
16   77
17   79
18   81
20   85
22   86
24   87
25   93
26   94
40   95

For c, I used cbind(cumsum(table(df$variable[c(FALSE, TRUE)])))

0     8
1     9
2    11
3    13
4    19
5    38
7    39
8    41
9    43
10   55
11   56
12   59
13   61
14   62
15   78
17   79
18   80
19   82
20   84
21   86
22   88
24   89
25   93
35   94
60   95

In frequency form, the distributions should look something like this.

    a   b   c
0   18  10  8
1   2   1   1
2   6   4   2
3   9   7   2
4   6   0   6
5   47  28  19
7   2   1   1
8   3   1   2
9   3   1   2
10  19  7   12
11  1   0   1
12  4   1   3
13  8   6   2
14  1   0   1
15  25  9   16
16  1   1   0
17  3   2   1
18  3   2   1
19  2   0   2
20  6   4   2
21  2   0   2
22  3   1   2
24  2   1   1
25  10  6   4
26  1   1   0
35  1   0   1
40  1   1   0
60  1   0   1
    190 95  95

But I want it in cumsum() form, such that it should look something like this. I wrote out the first 6 rows as illustration.

    a   b   c
0   18  10  8
1   20  11  9
2   26  15  11
3   35  22  13
4   41  22  19
5   88  50  38
7   90  51  39

The problem I've been having is that the subsets a and b doesn't have all the values (i.e. some values have 0 frequency), such that it shortens the length of the vector; as a result, I'm unable to properly merge or cbind() these values.

Any suggestion is greatly appreciated.

Upvotes: 0

Views: 103

Answers (1)

thelatemail
thelatemail

Reputation: 93813

You could probably get there using match quite easily. Assuming your data is:

set.seed(1)
df <- data.frame(variable=rbinom(10,prob=0.5,size=3))

Something like this seems to work

out <- data.frame(a,b=b[match(names(a),names(b))],c=c[match(names(a),names(c))])
replace(out,is.na(out),0)

#   a b c
#0  1 0 1
#1  4 2 2
#2  7 4 3
#3 10 5 5

Upvotes: 1

Related Questions