Reputation: 23
I have a data frame named "SpatialKey" with three columns. First column contains 5 categories representing population quintile. The second column has 4 kind of data: 0, 400, 800 and 1200. The third column represents population.
For example
quintile | isocrona | total |
---|---|---|
4 | 1200 | 1674 |
1 | 400 | 1676 |
4 | 400 | 1723 |
5 | 800 | 1567 |
3 | 0 | 1531 |
3 | 1200 | 1370 |
2 | 1200 | 1925 |
1 | 400 | 1916 |
5 | 0 | 1776 |
2 | 800 | 1896 |
3 | 800 | 2143 |
5 | 400 | 2098 |
4 | 400 | 1496 |
1 | 0 | 961 |
4 | 800 | 1684 |
I want to clasify the data by quintile and sum the population by the 4 kind of data I have in the second column. For example:
0 | 400 | 800 | 1200 | |
---|---|---|---|---|
1 | 961 | 3592 | 0 | 0 |
2 | 0 | 0 | 1896 | 1925 |
3 | 1531 | 0 | 2143 | 1370 |
4 | 0 | 3219 | 1684 | 1674 |
5 | 1776 | 2098 | 1567 | 0 |
And here is my code.
po <- SpatialKey %>%
group_by(quintile, isocrona) %>%
summarise_at(vars(contains("total")), sum)
final_df <- as.data.frame(t(po))
But R give me the following table:
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | V16 | V17 | V18 | V19 | V20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
quintile | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 5 | 5 | 5 | 5 |
isocrona | 0 | 400 | 800 | 1200 | 0 | 400 | 800 | 1200 | 0 | 400 | 800 | 1200 | 0 | 400 | 800 | 1200 | 0 | 400 | 800 | 1200 |
total | 961 | 3592 | 0 | 0 | 0 | 0 | 1896 | 1925 | 1531 | 0 | 2143 | 1370 | 0 | 3219 | 1684 | 1674 | 1776 | 2098 | 1567 | 0 |
How would I do the second table in R?
Upvotes: 1
Views: 386
Reputation: 2419
A method based on the idea of group
. The benefit is the result is still the dataframe format.
Result of long format:
library(data.table)
dt.long <- setDT(SpatialKey)[,sum(total),keyby = .(quintile,isocrona)]
dt.long
quintile isocrona V1
1: 1 0 961
2: 1 400 3592
3: 2 800 1896
4: 2 1200 1925
5: 3 0 1531
6: 3 800 2143
7: 3 1200 1370
8: 4 400 3219
9: 4 800 1684
10: 4 1200 1674
11: 5 0 1776
12: 5 400 2098
13: 5 800 1567
Result of wide format:
dcast(dt.long,quintile ~ isocrona,fill = 0,value.var = "V1")
quintile 0 400 800 1200
1: 1 961 3592 0 0
2: 2 0 0 1896 1925
3: 3 1531 0 2143 1370
4: 4 0 3219 1684 1674
5: 5 1776 2098 1567 0
Data:
SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L,
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L,
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L),
total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L,
1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)),
class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 1
Reputation: 269431
Use xtabs. Put the variable to be summed on the left hand side of the formula and the others on the right hand side. We can use dot to mean all the rest. No packages are used.
xtabs(total ~., SpatialKey)
giving this xtabs table:
isocrona
quintile 0 400 800 1200
1 961 3592 0 0
2 0 0 1896 1925
3 1531 0 2143 1370
4 0 3219 1684 1674
5 1776 2098 1567 0
The input in reproducible form is:
SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L,
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L,
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L),
total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L,
1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)),
class = "data.frame", row.names = c(NA, -15L))
Upvotes: 2
Reputation: 886938
Here we need a pivot_wider
to reshape into 'wide' format while doing the sum
library(dplyr)
library(tidyr)
SpatialKey %>%
arrange(quintile, isocrona) %>%
pivot_wider(names_from = isocrona, values_from = total,
values_fn = sum, values_fill = 0)
-output
# A tibble: 5 x 5
# quintile `0` `400` `800` `1200`
# <int> <int> <int> <int> <int>
#1 1 961 3592 0 0
#2 2 0 0 1896 1925
#3 3 1531 0 2143 1370
#4 4 0 3219 1684 1674
#5 5 1776 2098 1567 0
Or use xtabs
from base R
xtabs(total ~ quintile + isocrona, SpatialKey)
SpatialKey <- structure(list(quintile = c(4L, 1L, 4L, 5L, 3L, 3L, 2L, 1L, 5L,
2L, 3L, 5L, 4L, 1L, 4L), isocrona = c(1200L, 400L, 400L, 800L,
0L, 1200L, 1200L, 400L, 0L, 800L, 800L, 400L, 400L, 0L, 800L),
total = c(1674L, 1676L, 1723L, 1567L, 1531L, 1370L, 1925L,
1916L, 1776L, 1896L, 2143L, 2098L, 1496L, 961L, 1684L)),
class = "data.frame", row.names = c(NA,
-15L))
Upvotes: 1