Reputation: 97
I need to count occurences within my dataframe but I am missing some values. I have a dataframe that looks like this:
ID Year
18941 2015
56511 2010
58185 2016
66183 1999
18863 2003
26635 2002
18564 2003
And I need to count the occurences of the Years.
I have already managed to get an accurate count with
my_count <- data.frame(table(df$Year))
But the problem is that the output is missing some values, it kinda looks like this:
Var1 Freq
1991 5
1993 4
1996 24
1997 8
1998 6
...
And I need to get the missing rows with a count of "0". Is there any way to do this? Any help is appreciated, thank you very much!
Upvotes: 0
Views: 51
Reputation: 5719
Base R method:
y <- setNames(as.data.frame(seq(min(as.year(df$Year)), max(df$Year))), "Year")
df <- df[2:1]
zz <- merge(y, df, all = TRUE)
zz[is.na(zz)] <- 0
gg <- aggregate(zz, by = list(zz$Year, zz$ID >0),
FUN = table)
gg$Group.2 <- ifelse(gg$Group.2 == TRUE, gg$Year, 0)
gg <- setNames(gg[1:2], c("Year", "Count"))
gg[with(gg, order(Year)), ]
data:
df <- read.table(text = 'ID Year
18941 2015
56511 2010
58185 2016
66183 1999
18863 2003
26635 2002
18564 2003', header = TRUE)
Output:
# > gg[with(gg, order(Year)), ]
# Year Count
# 13 1999 1
# 1 2000 0
# 2 2001 0
# 14 2002 1
# 15 2003 2
# 3 2004 0
# 4 2005 0
# 5 2006 0
# 6 2007 0
# 7 2008 0
# 8 2009 0
# 16 2010 1
# 9 2011 0
# 10 2012 0
# 11 2013 0
# 12 2014 0
# 17 2015 1
# 18 2016 1
Upvotes: 0
Reputation: 39858
A possibility involving dplyr
and tidyr
could be:
df %>%
count(Year) %>%
complete(Year = seq(min(Year), max(Year), 1), fill = list(n = 0))
Year n
<dbl> <dbl>
1 1999 1
2 2000 0
3 2001 0
4 2002 1
5 2003 2
6 2004 0
7 2005 0
8 2006 0
9 2007 0
10 2008 0
11 2009 0
12 2010 1
13 2011 0
14 2012 0
15 2013 0
16 2014 0
17 2015 1
18 2016 1
Upvotes: 2