Reputation: 665
Apologies if I'm double-posting but I have tried different things I've seen in stackoverflow & can't quite solve the issue or understand why I'm having it.
So I have a dataset like this:
council_name <- c("Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barnet","Barnet")
period <- c("1st 2006", "1st 2006", "1st 2006", "1st 2006", "2nd 2006", "2nd 2006", "2nd 2006", "2nd 2006", "1st 2006", "1st 2006")
category <- c ("glass", "fridges", "paper", "glass", "glass", "fridges", "paper", "glass", "glass", "fridges")
data <- c(333, 222, 100, 98, 450, 540, 33, 450, 560, 120)
category2 <- c ("collected", "collected", "collected", "no donors", "collected", "collected", "collected", "no donors", "collected", "collected")
df <- data.frame (council_name, period, category, category2, data)
And what I want is something like this:
council_name <- c("Barking and Dagenham","Barking and Dagenham","Barnet")
period <- c("1st 2006", "2nd 2006", "1st 2006")
glass <- c(333, 450, 560)
fridges <- c(222,540,120)
paper <- c(100, 33, NA)
no.donors <- c(98, 450, NA)
df.desired <- data.frame (council_name, period, glass, fridges, paper, no.donors)
I've been trying multiple things with pivot functions but the fact that I need to extract column names from both category1 and category2 but filling cells with values from the same column in df is giving me all sorts of problems.
Thanks a lot for the help!
Upvotes: 0
Views: 409
Reputation: 70256
Here's another data.table approach with aggregation inside dcast
:
library(data.table)
setDT(df)
dcast(df, council_name + period ~ category + category2, value.var = "data", fun.aggregate = sum)
# council_name period fridges_collected glass_collected glass_no donors paper_collected
# 1: Barking and Dagenham 1st 2006 222 333 98 100
# 2: Barking and Dagenham 2nd 2006 540 450 450 33
# 3: Barnet 1st 2006 120 560 0 0
Upvotes: 1
Reputation: 6486
The following approach is a data.table one:
with your data:
library(data.table)
setDT(df)
df[, sum(data), by = .(council_name, period, category, category2)][, dcast(.SD, council_name + period ~ category + category2, value.var = "V1")]
Which will produce
df
council_name period fridges_collected glass_collected glass_no donors paper_collected
1: Barking and Dagenham 1st 2006 222 333 98 100
2: Barking and Dagenham 2nd 2006 540 450 450 33
3: Barnet 1st 2006 120 560 NA NA
That is essentially your data, except by the differences on names and column order. Those can be fixed with data.table's setnames
and setcolorder
.
sum(data), by = .(council_name...
will totalize your data for each unique combination of council, period, category and category2. Please consider running just this bit to see the output: df[, sum(data), by = .(council_name, period, category, category2)]
[, dcast(.SD, council_name + period ~ category + category2, value.var = V1)]
converts the long table that resulted before into a wide one (the format you want). It's saying that it wants council name and period in the rows and category and category2 in the columns, and that the value variable is V1 (V1 is a dummy name given in the first chained command, because I just wrote sum(data)
, without giving it a name).Hope it helps.
Upvotes: 2
Reputation: 5620
Here is a tidyverse
solution using pivot_wider
to transform your data into a wide format and then use rename
to change the column names.
library(tidyverse)
df %>%
# Pivot from long to wide format using the first two columns as id cols and using both category and category2 columns to get the new column names
pivot_wider(id_cols = c(council_name,period),
names_from = c(category, category2),
values_from = data) %>%
# Rename the columns
rename("glass" = "glass_collected",
"fridges" = "fridges_collected",
"paper" = "paper_collected",
"no.donors" = "glass_no donors")
# A tibble: 3 x 6
# council_name period glass fridges paper no.donors
# <fct> <fct> <dbl> <dbl> <dbl> <dbl>
# 1 Barking and Dagenham 1st 2006 333 222 100 98
# 2 Barking and Dagenham 2nd 2006 450 540 33 450
# 3 Barnet 1st 2006 560 120 NA NA
Upvotes: 2
Reputation: 136
At this situation you can use the spread()
function. The code looks like:
spread(data=df, key=category, value=data)
Upvotes: 0