Reputation: 18585
I have a data frame resembling the one below.
Group Expenditure Date
A 56434 22 June 2014
B 54231 1 July 2013
B 1412 9 May 2011
A NA 28 July 2009
A NA 3 July 2009
C 98 2 July 1999
C NA 14 July 2004
I'm interested in creating missing values reports for the Expenditure column. One value should return the number of missing values per column, this is solved with use of the following code
sapply(exp.dta, function(x) sum(is.na(x)))
In addition, I would like to report the number of missing values for each date. The date column is formatted as a proper date with use of the as.Date function. As for now, I'm not interested in reporting missing values per subgroup.
Upvotes: 1
Views: 768
Reputation: 11514
Try this:
library(plyr)
ddply(your.data, .(Date), summarize, nNA = sum(is.na(Expenditure))
This splits up the data into subgroups by Date
and applies the function sum(is.na())
to the Expenditure
column for these subgroups.
For instance,
df <- read.table(text="Group Expenditure Date
A 56434 22June2014
B 54231 1July2013
B 1412 9May2011
A NA 28July2009
A NA 3July2009
C 98 2July1999
C NA 14July2004 ", sep="", header=T)
ddply(df, .(Date), summarize, nNA=sum(is.na(Expenditure)))
yields:
Date nNA
1 14July2004 1
2 1July2013 0
3 22June2014 0
4 28July2009 1
5 2July1999 0
6 3July2009 1
7 9May2011 0
There are also several base
-solutions. Here some examples:
Using by
by(df, df$Date, function(x) sum(is.na(x$Expenditure)))
Using tapply
with(df, tapply(Expenditure, Date, function(x) sum(is.na(x))))
Using aggregate
(hat tip to @user20650)
aggregate(df$Expenditure, by=list(df$Date), FUN= function(x) sum(is.na(x)))
which all give the same result but with slightly different formats. Pick which one you like the most. For a more general treatment, this kind of problem is called "split-apply-combine", see e.g. here.
Upvotes: 3
Reputation: 99321
Following the code you already have written, you could add split
to it,
dat <- read.table(h=T, text = "Group Expenditure Date
A 56434 22-June-2014
B 54231 1-July-2013
B 1412 9-May-2011
A NA 28-July-2009
A NA 3-July-2009
C 98 2-July-1999
C NA 14-July-2004")
> sapply(split(dat$Expenditure, dat$Group), function(x) sum(is.na(x)))
# A B C
# 2 0 1
or for each date,
> s <- split(dat$Expenditure, dat$Date)
> as.matrix(sapply(s, function(x) sum(is.na(x))))
# [,1]
# 14-July-2004 1
# 1-July-2013 0
# 22-June-2014 0
# 28-July-2009 1
# 2-July-1999 0
# 3-July-2009 1
# 9-May-2011 0
Upvotes: 2
Reputation: 3615
Or, using dplyr
:
library('dplyr')
summarize(group_by(df, Date), nNA = sum(is.na(Expenditure)))
Upvotes: 1