I have a data frame named "SpatialKey" with three columns. First column contains 5 categories representing population quintile. The second column has 4 kind of data: 0, 400, 800 and 1200. The third column represents population. For example quintile isocrona total 4 1200 1674 1 400 1676 4 400 1723 5 800 1567 3 0 1531 3 1200 1370 2 1200 1925 1 400 1916 5 0 1776 2 800 1896 3 800 2143 5 400 2098 4 400 1496 1 0 961 4 800 1684 I want to clasify the data by quintile and sum the population by the 4 kind of data I have in the second column. For example: 0 400 800 1200 1 961 3592 0 0 2 0 0 1896 1925 3 1531 0 2143 1370 4 0 3219 1684 1674 5 1776 2098 1567 0 And here is my code. po <- SpatialKey %>% group_by(quintile, isocrona) %>% summarise_at(vars(contains("total")), sum) final_df <- as.data.frame(t(po)) But R give me the following table: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 quintile 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 isocrona 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 0 400 800 1200 total 961 3592 0 0 0 0 1896 1925 1531 0 2143 1370 0 3219 1684 1674 1776 2098 1567 0 How would I do the second table in R?

Reputation: 23

How to sum variables by groups in R

I have a data frame named "SpatialKey" with three columns. First column contains 5 categories representing population quintile. The second column has 4 kind of data: 0, 400, 800 and 1200. The third column represents population.

For example

quintile	isocrona	total
4	1200	1674
1	400	1676
4	400	1723
5	800	1567
3	0	1531
3	1200	1370
2	1200	1925
1	400	1916
5	0	1776
2	800	1896
3	800	2143
5	400	2098
4	400	1496
1	0	961
4	800	1684

I want to clasify the data by quintile and sum the population by the 4 kind of data I have in the second column. For example:

	0	400	800	1200
1	961	3592	0	0
2	0	0	1896	1925
3	1531	0	2143	1370
4	0	3219	1684	1674
5	1776	2098	1567	0

And here is my code.

po <- SpatialKey %>%
group_by(quintile, isocrona) %>%
summarise_at(vars(contains("total")), sum)
final_df <- as.data.frame(t(po))

But R give me the following table:

	V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	V11	V12	V13	V14	V15	V16	V17	V18	V19	V20
quintile	1	1	1	1	2	2	2	2	3	3	3	3	4	4	4	4	5	5	5	5
isocrona	0	400	800	1200	0	400	800	1200	0	400	800	1200	0	400	800	1200	0	400	800	1200
total	961	3592	0	0	0	0	1896	1925	1531	0	2143	1370	0	3219	1684	1674	1776	2098	1567	0

How would I do the second table in R?

Upvotes: 1

Answers (3)

Peace Wang

Reputation: 2419

A method based on the idea of group. The benefit is the result is still the dataframe format.

Result of long format:

library(data.table)
dt.long <- setDT(SpatialKey)[,sum(total),keyby = .(quintile,isocrona)]
dt.long

   quintile isocrona   V1
 1:        1        0  961
 2:        1      400 3592
 3:        2      800 1896
 4:        2     1200 1925
 5:        3        0 1531
 6:        3      800 2143
 7:        3     1200 1370
 8:        4      400 3219
 9:        4      800 1684
10:        4     1200 1674
11:        5        0 1776
12:        5      400 2098
13:        5      800 1567

Result of wide format:

dcast(dt.long,quintile ~ isocrona,fill = 0,value.var = "V1")

   quintile    0  400  800 1200
1:        1  961 3592    0    0
2:        2    0    0 1896 1925
3:        3 1531    0 2143 1370
4:        4    0 3219 1684 1674
5:        5 1776 2098 1567    0

Data:

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

G. Grothendieck

Reputation: 269431

Use xtabs. Put the variable to be summed on the left hand side of the formula and the others on the right hand side. We can use dot to mean all the rest. No packages are used.

xtabs(total ~., SpatialKey)

giving this xtabs table:

        isocrona
quintile    0  400  800 1200
       1  961 3592    0    0
       2    0    0 1896 1925
       3 1531    0 2143 1370
       4    0 3219 1684 1674
       5 1776 2098 1567    0

Note

The input in reproducible form is:

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, -15L))

Upvotes: 2

akrun

Reputation: 886938

Here we need a pivot_wider to reshape into 'wide' format while doing the sum

library(dplyr)
library(tidyr)
SpatialKey %>%
    arrange(quintile, isocrona) %>%
    pivot_wider(names_from = isocrona, values_from = total, 
        values_fn = sum, values_fill = 0)

-output

# A tibble: 5 x 5
#  quintile   `0` `400` `800` `1200`
#     <int> <int> <int> <int>  <int>
#1        1   961  3592     0      0
#2        2     0     0  1896   1925
#3        3  1531     0  2143   1370
#4        4     0  3219  1684   1674
#5        5  1776  2098  1567      0

Or use xtabs from base R

xtabs(total ~ quintile + isocrona, SpatialKey)

data

SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L, 
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L, 
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L), 
    total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L, 
    1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)), 
    class = "data.frame", row.names = c(NA, 
-15L))

Upvotes: 1

How to sum variables by groups in R

Answers (3)

Note

data

Related Questions