Ricardo Miranda
Ricardo Miranda

Reputation: 23

How to sum variables by groups in R

I have a data frame named "SpatialKey" with three columns. First column contains 5 categories representing population quintile. The second column has 4 kind of data: 0, 400, 800 and 1200. The third column represents population.

For example

quintile isocrona total
4 1200 1674
1 400 1676
4 400 1723
5 800 1567
3 0 1531
3 1200 1370
2 1200 1925
1 400 1916
5 0 1776
2 800 1896
3 800 2143
5 400 2098
4 400 1496
1 0 961
4 800 1684

I want to clasify the data by quintile and sum the population by the 4 kind of data I have in the second column. For example:

0 400 800 1200
1 961 3592 0 0
2 0 0 1896 1925
3 1531 0 2143 1370
4 0 3219 1684 1674
5 1776 2098 1567 0

And here is my code.

po <- SpatialKey %>%
group_by(quintile, isocrona) %>%
summarise_at(vars(contains("total")), sum)
final_df <- as.data.frame(t(po))

But R give me the following table:

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
quintile 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
isocrona 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200
total 961 3592 0 0 0 0 1896 1925 1531 0 2143 1370 0 3219 1684 1674 1776 2098 1567 0

How would I do the second table in R?

Upvotes: 1

Views: 386

Answers (3)

Peace Wang
Peace Wang

Reputation: 2419

A method based on the idea of group. The benefit is the result is still the dataframe format.

Result of long format:

library(data.table)
dt.long <- setDT(SpatialKey)[,sum(total),keyby = .(quintile,isocrona)]
dt.long

   quintile isocrona   V1
 1:        1        0  961
 2:        1      400 3592
 3:        2      800 1896
 4:        2     1200 1925
 5:        3        0 1531
 6:        3      800 2143
 7:        3     1200 1370
 8:        4      400 3219
 9:        4      800 1684
10:        4     1200 1674
11:        5        0 1776
12:        5      400 2098
13:        5      800 1567

Result of wide format:

dcast(dt.long,quintile ~ isocrona,fill = 0,value.var = "V1")

   quintile    0  400  800 1200
1:        1  961 3592    0    0
2:        2    0    0 1896 1925
3:        3 1531    0 2143 1370
4:        4    0 3219 1684 1674
5:        5 1776 2098 1567    0

Data:

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269431

Use xtabs. Put the variable to be summed on the left hand side of the formula and the others on the right hand side. We can use dot to mean all the rest. No packages are used.

xtabs(total ~., SpatialKey)

giving this xtabs table:

        isocrona
quintile    0  400  800 1200
       1  961 3592    0    0
       2    0    0 1896 1925
       3 1531    0 2143 1370
       4    0 3219 1684 1674
       5 1776 2098 1567    0

Note

The input in reproducible form is:

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, -15L))

Upvotes: 2

akrun
akrun

Reputation: 886938

Here we need a pivot_wider to reshape into 'wide' format while doing the sum

library(dplyr)
library(tidyr)
SpatialKey %>%
    arrange(quintile, isocrona) %>%
    pivot_wider(names_from = isocrona, values_from = total, 
        values_fn = sum, values_fill = 0)

-output

# A tibble: 5 x 5
#  quintile   `0` `400` `800` `1200`
#     <int> <int> <int> <int>  <int>
#1        1   961  3592     0      0
#2        2     0     0  1896   1925
#3        3  1531     0  2143   1370
#4        4     0  3219  1684   1674
#5        5  1776  2098  1567      0

Or use xtabs from base R

xtabs(total ~ quintile + isocrona, SpatialKey)

data

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

Related Questions