Mologan
Mologan

Reputation: 37

how to summarise a column and count different values?

I have a dataframe such that

Time       Name    Vote
20100102   Bob     Positive
20100104   Carlos  Negative
20100106   Kinder  Negative
20100106   Tony    Positive
.
.
.

I want to group the data by years and find the number of positive votes and the number of negative votes for different years.
the expected result is:

Year    Positive    Negative
2010       1201        891
2011       2039        189
.
.

Code used :

vote_year <- infile %>%
  group_by(Year = cut(Time,breaks = seq(20100100,20210100,by=10000))) %>%
  summarise(Positive = n(Vote == 'Positive'),Nagative = n(Vote == 'Negative')) %>%
  mutate(Year = seq(2010,2020))

I believe the issue is on summarise command, but I don't know how to fix it. It also seems the group_by function does not create an appropriate dataframe.

Upvotes: 2

Views: 436

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26238

BaseR

table(lubridate::year(lubridate::ymd(df1$Time)), df1$Vote)

#OR

table(substr(df1$Time, 1, 4), df1$Vote)

       Negative Positive
  2010        2        2
  2011        2        0

library(janitor) also helps

df1 <- read.table(header = TRUE, text = "
                  Time       Name    Vote
20100102   Bob     Positive
20100104   Carlos  Negative
20100106   Kinder  Negative
20100106   Tony    Positive
20110104   Carlo   Negative
20110106   Walt    Negative                  ")

library(lubridate)

library(janitor)

library(dplyr)


df1 %>% mutate(year = year(ymd(Time))) %>%
  tabyl(year, Vote)
#>  year Negative Positive
#>  2010        2        2
#>  2011        2        0

Janitor is more helpful because you can make further useful summary tables like this

df1 %>% mutate(year = year(ymd(Time))) %>%
  tabyl(year, Vote) %>%
  adorn_totals(c('row', 'col')) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 2) %>%
  adorn_ns("front")

  year    Negative   Positive       Total
  2010 2  (50.00%) 2 (50.00%) 4 (100.00%)
  2011 2 (100.00%) 0  (0.00%) 2 (100.00%)
 Total 4  (66.67%) 2 (33.33%) 6 (100.00%)

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 102920

Here is a data.table option using dcast (thank @AnilGoyal for the data)

dcast(setDT(df), substr(Time, 1, 4) ~ Vote)

which gives

   Time Negative Positive
1: 2010        2        2
2: 2011        2        0

Upvotes: 0

Anoushiravan R
Anoushiravan R

Reputation: 21938

I think you can use the following solution:

library(dplyr)
library(tidyr)
library(lubridate)

df1 %>% 
  group_by(Year = year(ymd(Time))) %>%
  count(Vote) %>%
  pivot_wider(names_from = Vote, values_from = n)

# A tibble: 1 x 3
# Groups:   Year [1]
   Year Negative Positive
  <dbl>    <int>    <int>
1  2010        2        2

Data

df1 <- read.table(header = TRUE, text = "
                  Time       Name    Vote
20100102   Bob     Positive
20100104   Carlos  Negative
20100106   Kinder  Negative
20100106   Tony    Positive")

Upvotes: 0

Related Questions