Reputation: 37
I have a dataframe such that
Time Name Vote
20100102 Bob Positive
20100104 Carlos Negative
20100106 Kinder Negative
20100106 Tony Positive
.
.
.
I want to group the data by years and find the number of positive votes and the number of negative votes for different years.
the expected result is:
Year Positive Negative
2010 1201 891
2011 2039 189
.
.
Code used :
vote_year <- infile %>%
group_by(Year = cut(Time,breaks = seq(20100100,20210100,by=10000))) %>%
summarise(Positive = n(Vote == 'Positive'),Nagative = n(Vote == 'Negative')) %>%
mutate(Year = seq(2010,2020))
I believe the issue is on summarise command, but I don't know how to fix it. It also seems the group_by function does not create an appropriate dataframe.
Upvotes: 2
Views: 436
Reputation: 26238
BaseR
table(lubridate::year(lubridate::ymd(df1$Time)), df1$Vote)
#OR
table(substr(df1$Time, 1, 4), df1$Vote)
Negative Positive
2010 2 2
2011 2 0
library(janitor)
also helps
df1 <- read.table(header = TRUE, text = "
Time Name Vote
20100102 Bob Positive
20100104 Carlos Negative
20100106 Kinder Negative
20100106 Tony Positive
20110104 Carlo Negative
20110106 Walt Negative ")
library(lubridate)
library(janitor)
library(dplyr)
df1 %>% mutate(year = year(ymd(Time))) %>%
tabyl(year, Vote)
#> year Negative Positive
#> 2010 2 2
#> 2011 2 0
Janitor
is more helpful because you can make further useful summary tables like this
df1 %>% mutate(year = year(ymd(Time))) %>%
tabyl(year, Vote) %>%
adorn_totals(c('row', 'col')) %>%
adorn_percentages() %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns("front")
year Negative Positive Total
2010 2 (50.00%) 2 (50.00%) 4 (100.00%)
2011 2 (100.00%) 0 (0.00%) 2 (100.00%)
Total 4 (66.67%) 2 (33.33%) 6 (100.00%)
Upvotes: 2
Reputation: 102920
Here is a data.table
option using dcast
(thank @AnilGoyal for the data)
dcast(setDT(df), substr(Time, 1, 4) ~ Vote)
which gives
Time Negative Positive
1: 2010 2 2
2: 2011 2 0
Upvotes: 0
Reputation: 21938
I think you can use the following solution:
library(dplyr)
library(tidyr)
library(lubridate)
df1 %>%
group_by(Year = year(ymd(Time))) %>%
count(Vote) %>%
pivot_wider(names_from = Vote, values_from = n)
# A tibble: 1 x 3
# Groups: Year [1]
Year Negative Positive
<dbl> <int> <int>
1 2010 2 2
Data
df1 <- read.table(header = TRUE, text = "
Time Name Vote
20100102 Bob Positive
20100104 Carlos Negative
20100106 Kinder Negative
20100106 Tony Positive")
Upvotes: 0