Simon H
Simon H

Reputation: 75

Replace NA values of subgroups in different columns with other values in separate column

My problem:

Tom_dog <- c(1,4,NA,6,10,5)
Joe_dog <- c(2,NA,8,10,12,5)
Theo_dog <- c(5,1,6,8,NA,7)
Gus_cat <- c(9,10,14,12,13,NA)
Walz_cat <- c(NA, 9,8,7,4,2)
Ron_cat <- c(15,13,NA,2,5,6)
df <- data.frame(Tom_dog,Joe_dog,Theo_dog,Gus_cat,Walz_cat,Ron_cat)

I calculate the mean for the dogs and the cats and attach it to the dataframe in a new column

df$dog_mean <- rowMeans(df[ , grepl("^.+(_dog)$", colnames(df))], na.rm = TRUE)
df$cat_mean <- rowMeans(df[ , grepl("^.+(_cat)$", colnames(df))], na.rm = TRUE)

Now, what I would like to do is replace the NA value of the dogs, with the mean of of the dog in the same row. In the second step the same thing with the cats. I tried somethin like this, but didn't work:

df[ , grepl("^.+(_dog)$", colnames(df))][is.na(df[ , grepl("^.+(_dog)$", colnames(df))])]
<- df$dog_mean[is.na(df[ , grepl("^.+(_dog)$", colnames(df))])]

Help greatly appreciated!

Upvotes: 1

Views: 103

Answers (2)

lmo
lmo

Reputation: 38510

In base R, you can do this with two passes of lapply:

# dogs
df[, grepl("_dog", names(df))] <- lapply(df[, grepl("_dog", names(df))],
                                       function(i) {i[is.na(i)] <- df$dog_mean[is.na(i)]; i})
# cats
df[, grepl("_cat", names(df))] <- lapply(df[, grepl("_cat", names(df))],
                                       function(i) {i[is.na(i)] <- df$cat_mean[is.na(i)]; i})

Here, the list that lapply returns is fed back into the appropriate spot in the data.frame. The {} make sure that the entire block of code (two lines, separated by ; is executed in one go).

Upvotes: 1

rosscova
rosscova

Reputation: 5590

Instead of trying to do the transformation in a single step, you might be better off with an lapply call to make the conversion one column at a time (I'm using magrittr here just to save typing the entire first line twice:

library( magrittr )
df[ , grepl("^.+(_dog)$", colnames(df))] %<>%
    lapply( function( x, vals ) {
        ifelse( is.na( x ), vals, x )
    },
    vals = df$dog_mean )

And the same for cats:

df[ , grepl("^.+(_cat)$", colnames(df))] %<>%
    lapply( function( x, vals ) {
        ifelse( is.na( x ), vals, x )
    },
    vals = df$cat_mean )

Upvotes: 1

Related Questions