Reputation: 113
I'm trying to clean a dataset by determining if an NA should be replaced with a 0, or if left as NA.
The below is a sample data set. 'Dom.Supply' should equal the sum of the remaining factors. For example, the NAs that appear in rows 3:5 for 'Feed', 'Waste', 'Processing' and 'Other.Uses' could each be replaced with 0 as the sum of factors with values (ie. 'Food' and 'Seed') is equal to the value of 'Dom.Supply. However, in rows 1 & 2, the NAs would have to remain as the sum of 'Food' and 'Seed' is not equal to 'Dom.Supply'.
Region Country Group Item Year Production Imports Stock.Var Exports Dom.Supply Feed Seed Waste Processing Other.Uses Food
NAm.Oceania Australia Cereals Rye 1961 11 0 0 2 9 NA 1 NA NA NA 7
NAm.Oceania Australia Cereals Rye 1962 10 0 0 3 7 NA 1 NA NA NA 5
NAm.Oceania Australia Cereals Rye 1963 10 0 0 1 9 NA 2 NA NA NA 7
NAm.Oceania Australia Cereals Rye 1964 14 0 -5 0 9 NA 2 NA NA NA 7
NAm.Oceania Australia Cereals Rye 1965 11 0 5 0 16 NA 2 NA NA NA 14
I had though to use the replace
function as follows (as an example, but not the operation I wish to use), but this is a simplistic replacement of NAs with 0s rather than a test that NA=0.
data$AF2 <- 1-((replace(data$Feed, is.na(data$Feed), 0) + (replace(data$Seed,
is.na(data$Seed), 0)) / data$Dom.Supply))
Thanks!
Upvotes: 3
Views: 680
Reputation: 11617
CathG answer deserves to be in a real answer, not only as a comment (posting here as community wiki):
DF <- read.table(text = "Dom.Supply Feed Seed Waste Processing Other.Uses Food
9 NA 1 NA NA NA 7
7 NA 1 NA NA NA 5
9 NA 2 NA NA NA 7
9 NA 2 NA NA NA 7
16 NA 2 NA NA NA 14", header = TRUE)
ix <- rowSums(DF[, -1], na.rm = TRUE) == DF[, 1]
DF[ix, ][is.na(DF[ix, ])] <- 0
DF
# Dom.Supply Feed Seed Waste Processing Other.Uses Food
#1 9 NA 1 NA NA NA 7
#2 7 NA 1 NA NA NA 5
#3 9 0 2 0 0 0 7
#4 9 0 2 0 0 0 7
#5 16 0 2 0 0 0 14
EDIT
In case you have NAs
in the column Dom.Supply
, ix
will have a NA
for the corresponding row. You can change the NA
into FALSE
(as you want to leave the NAs
in the other columns as NAs
):
ix <- rowSums(DF[, -1], na.rm = TRUE) == DF[, 1]
ix[is.na(ix)] <- FALSE
DF[ix, ][is.na(DF[ix, ])] <- 0
Example, with the same original DF
:
DF[2, 1] <- NA
ix <- rowSums(DF[, -1], na.rm = TRUE) == DF[, 1]
ix[is.na(ix)] <- FALSE
DF[ix, ][is.na(DF[ix, ])] <- 0
DF
# Dom.Supply Feed Seed Waste Processing Other.Uses Food
#1 9 NA 1 NA NA NA 7
#2 NA NA 1 NA NA NA 5
#3 9 0 2 0 0 0 7
#4 9 0 2 0 0 0 7
#5 16 0 2 0 0 0 14
Upvotes: 3
Reputation: 132959
I'd use this:
DF <- read.table(text = "Dom.Supply Feed Seed Waste Processing Other.Uses Food
9 NA 1 NA NA NA 7
7 NA 1 NA NA NA 5
9 NA 2 NA NA NA 7
9 NA 2 NA NA NA 7
16 NA 2 NA NA NA 14", header = TRUE)
ix <- rowSums(DF[, -1], na.rm = TRUE) == DF[, 1]
DF[ix,] <- lapply(DF[ix,], function(x) {
x[is.na(x)] <- 0
x
})
# Dom.Supply Feed Seed Waste Processing Other.Uses Food
#1 9 NA 1 NA NA NA 7
#2 7 NA 1 NA NA NA 5
#3 9 0 2 0 0 0 7
#4 9 0 2 0 0 0 7
#5 16 0 2 0 0 0 14
Upvotes: 5