Reputation: 57
I have a df called reviews_gh
with the following format
Date Market Positive.or.Negative.
01-01-2020 A Positive
01-01-2020 A Positive
01-01-2020 B Positive
01-01-2020 B Negative
....
I am trying to group by date and business and create a new column called positive and negative that sums the number of times a negative and a positive are in that Market on that day
this is the code I have right now
reviews_gh_agg <- reviews_gh %>%
group_by(Date, Market) %>%
summarise(positive = sum(reviews_gh$Positive.or.Negative.=="Positive"), negative =
sum(reviews_gh$Positive.or.Negative.=="Negative") )
But the result I get is wrong, I am getting on the new positive and negative column the TOTAL sum of ALL the observations, not grouped by day and market
the result for the small example at top should be
Date Market Positive Negative
01-01-2020 A 2 0
01-01-2020 B 1 1
Thanks for the help
Upvotes: 1
Views: 36
Reputation: 21908
I hope this is what you are looking for. I just made a slight modification to your code as you don't need $
to refer to a column name in tidyverse
thanks to data masking.
df %>%
group_by(Date, Market) %>%
summarise(positive = sum(Positive.or.Negative.=="Positive"), negative =
sum(Positive.or.Negative.=="Negative"))
# A tibble: 2 x 4
# Groups: Date [1]
Date Market positive negative
<chr> <chr> <int> <int>
1 01-01-2020 A 2 0
2 01-01-2020 B 1 1
Updated Yet another valuable solution by dear @akrun.
df %>%
group_by(Date, Market) %>%
summarise(out = list(table(Positive.or.Negative.)), .groups = "drop") %>%
unnest_wider(c(out))
# A tibble: 2 x 4
Date Market Positive Negative
<chr> <chr> <int> <int>
1 01-01-2020 A 2 NA
2 01-01-2020 B 1 1
Date
df <- tribble(
~Date, ~Market, ~Positive.or.Negative.,
"01-01-2020", "A", "Positive",
"01-01-2020", "A", "Positive",
"01-01-2020", "B", "Positive",
"01-01-2020", "B", "Negative"
)
Upvotes: 2
Reputation: 388807
You can do this with tidyr::pivot_wider
:
tidyr::pivot_wider(df, names_from = Positive.or.Negative.,
values_from = Positive.or.Negative.,
values_fn = length,
values_fill = 0)
# Date Market Positive Negative
# <chr> <chr> <int> <int>
#1 01-01-2020 A 2 0
#2 01-01-2020 B 1 1
and with data.table
:
library(data.table)
dcast(setDT(df), Date + Market~Positive.or.Negative.,
value.var = 'Positive.or.Negative.', fun.aggregate = length)
Upvotes: 0
Reputation: 5620
Here is another tidyverse
solution, using count
and pivot_wider
.
library(tidyverse)
df %>%
# Group by Date, Market and Positive/Negative
group_by(Date, Market, Positive.or.Negative.) %>%
# Count
count() %>%
# Change to wide format, fill NA with 0's
pivot_wider(names_from = Positive.or.Negative.,
values_from = n,
values_fill = 0)
Upvotes: 1