AntVal
AntVal

Reputation: 665

Pivot data with values from different columns in R

Apologies if I'm double-posting but I have tried different things I've seen in stackoverflow & can't quite solve the issue or understand why I'm having it.

So I have a dataset like this:

council_name <- c("Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barking and Dagenham","Barnet","Barnet")
period <- c("1st 2006", "1st 2006", "1st 2006", "1st 2006", "2nd 2006", "2nd 2006", "2nd 2006", "2nd 2006", "1st 2006", "1st 2006") 
category <- c ("glass", "fridges", "paper", "glass", "glass", "fridges", "paper", "glass", "glass", "fridges")
data <- c(333, 222, 100, 98, 450, 540, 33, 450, 560, 120)
category2 <- c ("collected", "collected", "collected", "no donors", "collected", "collected", "collected", "no donors", "collected", "collected")
df <- data.frame (council_name, period, category, category2, data)

And what I want is something like this:

council_name <- c("Barking and Dagenham","Barking and Dagenham","Barnet")
period <- c("1st 2006", "2nd 2006", "1st 2006") 
glass <- c(333, 450, 560)
fridges <- c(222,540,120)
paper <- c(100, 33, NA)
no.donors <- c(98, 450, NA)
df.desired <- data.frame (council_name, period, glass, fridges, paper, no.donors)

I've been trying multiple things with pivot functions but the fact that I need to extract column names from both category1 and category2 but filling cells with values from the same column in df is giving me all sorts of problems.

Thanks a lot for the help!

Upvotes: 0

Views: 409

Answers (4)

talat
talat

Reputation: 70256

Here's another data.table approach with aggregation inside dcast:

library(data.table)
setDT(df)

dcast(df, council_name + period ~ category + category2, value.var = "data", fun.aggregate = sum)
#            council_name   period fridges_collected glass_collected glass_no donors paper_collected
# 1: Barking and Dagenham 1st 2006               222             333              98             100
# 2: Barking and Dagenham 2nd 2006               540             450             450              33
# 3:               Barnet 1st 2006               120             560               0               0

Upvotes: 1

PavoDive
PavoDive

Reputation: 6486

The following approach is a data.table one:

with your data:

library(data.table)

setDT(df)

df[, sum(data), by = .(council_name, period, category, category2)][, dcast(.SD, council_name + period ~ category + category2, value.var = "V1")]

Which will produce

df
           council_name   period fridges_collected glass_collected glass_no donors paper_collected
1: Barking and Dagenham 1st 2006               222             333              98             100
2: Barking and Dagenham 2nd 2006               540             450             450              33
3:               Barnet 1st 2006               120             560              NA              NA

That is essentially your data, except by the differences on names and column order. Those can be fixed with data.table's setnames and setcolorder.

What is in there:

  • sum(data), by = .(council_name... will totalize your data for each unique combination of council, period, category and category2. Please consider running just this bit to see the output: df[, sum(data), by = .(council_name, period, category, category2)]
  • The second part, [, dcast(.SD, council_name + period ~ category + category2, value.var = V1)] converts the long table that resulted before into a wide one (the format you want). It's saying that it wants council name and period in the rows and category and category2 in the columns, and that the value variable is V1 (V1 is a dummy name given in the first chained command, because I just wrote sum(data), without giving it a name).

Hope it helps.

Upvotes: 2

Here is a tidyverse solution using pivot_wider to transform your data into a wide format and then use rename to change the column names.

library(tidyverse)

df %>%
  # Pivot from long to wide format using the first two columns as id cols and using both category and category2 columns to get the new column names
  pivot_wider(id_cols = c(council_name,period),
              names_from = c(category, category2),
              values_from = data) %>%
  # Rename the columns
  rename("glass" = "glass_collected",
         "fridges" = "fridges_collected",
         "paper" = "paper_collected",
         "no.donors" = "glass_no donors")

# A tibble: 3 x 6
# council_name         period   glass fridges paper no.donors
# <fct>                <fct>    <dbl>   <dbl> <dbl>     <dbl>
# 1 Barking and Dagenham 1st 2006   333     222   100        98
# 2 Barking and Dagenham 2nd 2006   450     540    33       450
# 3 Barnet               1st 2006   560     120    NA        NA

Upvotes: 2

sharmajee499
sharmajee499

Reputation: 136

At this situation you can use the spread() function. The code looks like:

spread(data=df, key=category, value=data)

Upvotes: 0

Related Questions