Adiel
Adiel

Reputation: 1215

Looping through aggregated data in R

I'm trying to impute missing values in a data frame specific column.

My intention is to replace it by means of groups of other column.

I've saved aggregated results using aggregate:

# Replace LotFrontage missing values by Neighborhood mean
lot_frontage_by_neighborhood = aggregate(LotFrontage ~ Neighborhood, combined, mean)

And now I want to implement something like this:

for key, group in lot_frontage_by_neighborhood:
    idx = (combined["Neighborhood"] == key) & (combined["LotFrontage"].isnull())
    combined[idx, "LotFrontage"] = group.median() 

This is of course a python code.

Not sure how to achieve this in R, can someone please help?

For example:

Neighborhood  LotFrontage
     A            20
     A            30
     B            20
     B            50
     A           <NA>

NA Record should be replace with 25 (Average LotFrontage of all records in Neighborhood A)

Thanks

Upvotes: 0

Views: 82

Answers (1)

user4148072
user4148072

Reputation:

Is this the idea you are looking for? You may need the which() function to determine which rows have NA values.

set.seed(1)
Neighborhood = sample(letters[1:4], 10, TRUE)
LotFrontage = rnorm(10,0,1)
LotFrontage[sample(10, 2)] = NA

# This data frame has 2 columns. LotFrontage column has 10 missing values.
df = data.frame(Neighborhood = Neighborhood, LotFrontage = LotFrontage)

# Sets the missing values in the Neighborhood column to the mean of the LotFrontage values from the rows with that Neighborhood
x<-df[which(is.na(df$LotFrontage)),]$Neighborhood
f<-function(x) mean(df[(df$Neighborhood==x),]$LotFrontage, na.rm =TRUE)
df[which(is.na(df$LotFrontage)),]$LotFrontage <- lapply(x,f)

Upvotes: 1

Related Questions