Reputation: 73
I'm new to R and my question might seem easy for most of you. I have a data like this
> data.frame(table(dat),total)
AGEintervals mytest.G_B_FLAG Freq total
1 (1,23] 0 5718 5912
2 (23,26] 0 5249 5579
3 (26,28] 0 3105 3314
4 (28,33] 0 6277 6693
5 (33,37] 0 4443 4682
6 (37,41] 0 4277 4514
7 (41,46] 0 4904 5169
8 (46,51] 0 4582 4812
9 (51,57] 0 4039 4236
10 (57,76] 0 3926 4031
11 (1,23] 1 194 5912
12 (23,26] 1 330 5579
13 (26,28] 1 209 3314
14 (28,33] 1 416 6693
15 (33,37] 1 239 4682
16 (37,41] 1 237 4514
17 (41,46] 1 265 5169
18 (46,51] 1 230 4812
19 (51,57] 1 197 4236
20 (57,76] 1 105 4031
As you might have noticed age intervals start to repeating on 11 row. All I need is to get 10 rows and 0's and 1' in different columns. Like this
AGEintervals 1 0 total
1 (1,23] 194 5718 5912
2 (23,26] 330 5249 5579
3 (26,28] 209 3105 3314
4 (28,33] 416 6277 6693
5 (33,37] 239 4443 4682
6 (37,41] 237 4277 4514
7 (41,46] 265 4904 5169
8 (46,51] 230 4582 4812
9 (51,57] 197 4039 4236
10 (57,76] 105 3926 4031
Many thanks
Upvotes: 0
Views: 60
Reputation: 193497
This is a straightforward "long" to "wide" transformation that is easy to achieve with reshape
from base R:
reshape(mydf, idvar = c("AGEintervals", "total"),
timevar = "mytest.G_B_FLAG", direction = "wide")
# AGEintervals total Freq.0 Freq.1
# 1 (1,23] 5912 5718 194
# 2 (23,26] 5579 5249 330
# 3 (26,28] 3314 3105 209
# 4 (28,33] 6693 6277 416
# 5 (33,37] 4682 4443 239
# 6 (37,41] 4514 4277 237
# 7 (41,46] 5169 4904 265
# 8 (46,51] 4812 4582 230
# 9 (51,57] 4236 4039 197
# 10 (57,76] 4031 3926 105
Other alternatives include:
reshape2
library(reshape2)
dcast(mydf, ... ~ mytest.G_B_FLAG, value.var='Freq')
tidyr
library(tidyr)
spread(df, mytest.G_B_FLAG, Freq)
This problem is possibly avoidable in the first place.
Run the following example code and compare the output at each stage:
## Create some sample data
set.seed(1)
dat <- data.frame(V1 = sample(letters[1:3], 20, TRUE),
V2 = sample(c(0, 1), 20, TRUE))
## View the output
dat
## Look what happens when we use `data.frame` on a `table`
data.frame(table(dat))
## Compare it with `as.data.frame.matrix`
as.data.frame.matrix(table(dat))
## The total can be added automatically with `addmargins`
as.data.frame.matrix(addmargins(table(dat), 2, sum))
Upvotes: 2