Reputation: 1293
I have a data.frame that has several NAs. I already know that if a column has some zero values for some firm, then those NAs are zeros as well. What would be a good way to replace those and only those NAs with zeros.
An example:
I want to transform this
FIRMID VAR1 VAR2
FIRM1 0 1
FIRM1 NA NA
FIRM2 1 0
FIRM2 NA NA
to this
FIRMID VAR1 VAR2
FIRM1 0 1
FIRM1 0 NA
FIRM2 1 0
FIRM2 NA 0
EDIT: number of variables is possibly large and therefore I want to find a way to apply this neatly to all of them at the same time without manually entering each variable name.
Upvotes: 2
Views: 397
Reputation: 67778
Here is another ddply
alternative where you don't have to specify the variable names which the function should be applied on. By using numcolwise
, the function operates on all numerical columns.
library(plyr)
myfun <- function(x){
x[is.na(x) & (sum(!is.na(x) & x == 0) > 0)] <- 0
x}
ddply(df, .(FIRMID), numcolwise(myfun))
# FIRMID VAR1 VAR2
# 1 FIRM1 0 1
# 2 FIRM1 0 NA
# 3 FIRM2 1 0
# 4 FIRM2 NA 0
Or in base
R, where I assume that the first column contains the grouping variable (dat[ , -1]
). You could of course refer to it by name instead.
df2 <- do.call(rbind, by(df, df[ , "FIRMID"], function(dat){
sapply(dat[ , -1], function(x){
myfun(x)
})
}))
data.frame(FIRMID = df$FIRMID, df2)
# FIRMID VAR1 VAR2
# 1 FIRM1 0 1
# 2 FIRM1 0 NA
# 3 FIRM2 1 0
# 4 FIRM2 NA 0
Update 'myfun' can be written much simpler. Thanks @Arun for the suggestion!
myfun <- function(x){
x[is.na(x) & any(x == 0)] <- 0
x}
Upvotes: 4
Reputation: 341
you can use ddply here. but it would be very inefficient if the data.frame is very large. if not, then you could try:
your.data.frame<-ddply(your.data.frame,~FIRMID,function(x){
if ( any(x[!is.na(x$VAR1),"VAR1"]==0)){x[is.na(x$VAR1),"VAR1"]<-0}
if ( any(x[!is.na(x$VAR2),"VAR2"]==0)){x[is.na(x$VAR2),"VAR2"]<-0}
x})
but very inelegant
EDIT: my code before didn't worked, so I fixed it :)
Upvotes: 2
Reputation: 132706
If you don't have only integers you might need to adjust this for comparing floating point numbers:
DF <- read.table(text="FIRMID VAR1 VAR2
FIRM1 0 1
FIRM1 NA NA
FIRM2 1 0
FIRM2 NA NA", header=TRUE)
na_replace <- function(x) {
if (any(na.omit(x)==0L)) x[is.na(x)] <- 0L
x
}
library(plyr)
ddply(DF, .(FIRMID), transform,
VAR1=na_replace(VAR1),
VAR2=na_replace(VAR2))
# FIRMID VAR1 VAR2
#1 FIRM1 0 1
#2 FIRM1 0 NA
#3 FIRM2 1 0
#4 FIRM2 NA 0
Upvotes: 3