Madina Tultabayeva
Madina Tultabayeva

Reputation: 73

How to transform particular rows into columns in R

I'm new to R and my question might seem easy for most of you. I have a data like this

> data.frame(table(dat),total)
   AGEintervals mytest.G_B_FLAG Freq total
1        (1,23]               0 5718  5912
2       (23,26]               0 5249  5579
3       (26,28]               0 3105  3314
4       (28,33]               0 6277  6693
5       (33,37]               0 4443  4682
6       (37,41]               0 4277  4514
7       (41,46]               0 4904  5169
8       (46,51]               0 4582  4812
9       (51,57]               0 4039  4236
10      (57,76]               0 3926  4031
11       (1,23]               1  194  5912
12      (23,26]               1  330  5579
13      (26,28]               1  209  3314
14      (28,33]               1  416  6693
15      (33,37]               1  239  4682
16      (37,41]               1  237  4514
17      (41,46]               1  265  5169
18      (46,51]               1  230  4812
19      (51,57]               1  197  4236
20      (57,76]               1  105  4031

As you might have noticed age intervals start to repeating on 11 row. All I need is to get 10 rows and 0's and 1' in different columns. Like this

AGEintervals              1       0   total
1        (1,23]          194      5718  5912
2       (23,26]          330      5249  5579
3       (26,28]          209      3105  3314
4       (28,33]          416      6277  6693
5       (33,37]          239      4443  4682
6       (37,41]          237      4277  4514
7       (41,46]          265      4904  5169
8       (46,51]          230      4582  4812
9       (51,57]          197      4039  4236
10      (57,76]          105      3926  4031

Many thanks

Upvotes: 0

Views: 60

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

This is a straightforward "long" to "wide" transformation that is easy to achieve with reshape from base R:

reshape(mydf, idvar = c("AGEintervals", "total"), 
        timevar = "mytest.G_B_FLAG", direction = "wide")
#    AGEintervals total Freq.0 Freq.1
# 1        (1,23]  5912   5718    194
# 2       (23,26]  5579   5249    330
# 3       (26,28]  3314   3105    209
# 4       (28,33]  6693   6277    416
# 5       (33,37]  4682   4443    239
# 6       (37,41]  4514   4277    237
# 7       (41,46]  5169   4904    265
# 8       (46,51]  4812   4582    230
# 9       (51,57]  4236   4039    197
# 10      (57,76]  4031   3926    105

Other alternatives include:

reshape2

library(reshape2)
dcast(mydf, ... ~ mytest.G_B_FLAG, value.var='Freq')

tidyr

library(tidyr)
spread(df, mytest.G_B_FLAG, Freq)

Update

This problem is possibly avoidable in the first place.

Run the following example code and compare the output at each stage:

## Create some sample data
set.seed(1)
dat <- data.frame(V1 = sample(letters[1:3], 20, TRUE),
                  V2 = sample(c(0, 1), 20, TRUE))

## View the output
dat

## Look what happens when we use `data.frame` on a `table`
data.frame(table(dat))

## Compare it with `as.data.frame.matrix`
as.data.frame.matrix(table(dat))

## The total can be added automatically with `addmargins`
as.data.frame.matrix(addmargins(table(dat), 2, sum))

Upvotes: 2

Related Questions