jlozasi
jlozasi

Reputation: 57

Summarizing the count of values of one column in 2 different columns

I have a df called reviews_gh with the following format

Date         Market  Positive.or.Negative.
01-01-2020     A              Positive
01-01-2020     A              Positive
01-01-2020     B              Positive
01-01-2020     B              Negative
....

I am trying to group by date and business and create a new column called positive and negative that sums the number of times a negative and a positive are in that Market on that day

this is the code I have right now

  reviews_gh_agg <- reviews_gh %>% 
  group_by(Date, Market) %>% 
  summarise(positive = sum(reviews_gh$Positive.or.Negative.=="Positive"), negative = 
  sum(reviews_gh$Positive.or.Negative.=="Negative") )

But the result I get is wrong, I am getting on the new positive and negative column the TOTAL sum of ALL the observations, not grouped by day and market

the result for the small example at top should be

    Date         Market  Positive     Negative
01-01-2020     A            2            0
01-01-2020     B            1            1         

Thanks for the help

Upvotes: 1

Views: 36

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

I hope this is what you are looking for. I just made a slight modification to your code as you don't need $ to refer to a column name in tidyverse thanks to data masking.

df %>% 
  group_by(Date, Market) %>% 
  summarise(positive = sum(Positive.or.Negative.=="Positive"), negative = 
              sum(Positive.or.Negative.=="Negative"))


# A tibble: 2 x 4
# Groups:   Date [1]
  Date       Market positive negative
  <chr>      <chr>     <int>    <int>
1 01-01-2020 A             2        0
2 01-01-2020 B             1        1

Updated Yet another valuable solution by dear @akrun.

df %>%
  group_by(Date, Market) %>%
  summarise(out = list(table(Positive.or.Negative.)), .groups = "drop") %>%
  unnest_wider(c(out))

# A tibble: 2 x 4
  Date       Market Positive Negative
  <chr>      <chr>     <int>    <int>
1 01-01-2020 A             2       NA
2 01-01-2020 B             1        1

Date

df <- tribble(
  ~Date,         ~Market,  ~Positive.or.Negative.,
  "01-01-2020",     "A",              "Positive",
  "01-01-2020",     "A",              "Positive",
  "01-01-2020",     "B",              "Positive",
  "01-01-2020",     "B",              "Negative"
)

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388807

You can do this with tidyr::pivot_wider :

tidyr::pivot_wider(df, names_from = Positive.or.Negative., 
                       values_from = Positive.or.Negative., 
                       values_fn = length, 
                       values_fill = 0)

#  Date       Market Positive Negative
#  <chr>      <chr>     <int>    <int>
#1 01-01-2020 A             2        0
#2 01-01-2020 B             1        1

and with data.table :

library(data.table)

dcast(setDT(df),  Date + Market~Positive.or.Negative., 
      value.var = 'Positive.or.Negative.', fun.aggregate = length)

Upvotes: 0

Here is another tidyverse solution, using count and pivot_wider.

library(tidyverse)

df %>% 
  # Group by Date, Market and Positive/Negative
  group_by(Date, Market, Positive.or.Negative.) %>%
  # Count
  count() %>%
  # Change to wide format, fill NA with 0's
  pivot_wider(names_from = Positive.or.Negative.,
              values_from = n,
              values_fill = 0)

Upvotes: 1

Related Questions