Ziwei
Ziwei

Reputation: 57

how to replace the NA in a data frame with the average number of this data frame

I have a data frame like this:

nums  id
1233 1     
3232 2    
2334 3     
3330 1    
1445 3     
3455 3     
7632 2
NA   3
NA   1

And I can know the average "nums" of each "id" by using:

id_avg <- aggregate(nums ~ id, data = dat, FUN = mean)

What I would like to do is to replace the NA with the value of the average number of the corresponding id. for example, the average "nums" of 1,2,3 are 1000, 2000, 3000, respectively. The NA when id == 3 will be replaced by 3000, the last NA whose id == 1 will be replaced by 1000. I tried the following code to achieve this:

temp <- dat[is.na(dat$nums),]$id
dat[is.na(dat$nums),]$nums <- id_avg[id_avg[,"id"] ==temp,]$nums

However, the second part

id_avg[id_avg[,"id"] ==temp,]$nums

is always NA, which means I always pass NA to the NAs I want to replace. I don't know where I was wrong, or do you have better method to do this? Thank you

Upvotes: 0

Views: 97

Answers (3)

akuiper
akuiper

Reputation: 215137

Here is a dplyr way:

df %>% 
       group_by(id) %>% 
       mutate(nums = replace(nums, is.na(nums), as.integer(mean(nums, na.rm = T))))

# Source: local data frame [9 x 2]
# Groups: id [3]

#    nums    id
#   <int> <int>
# 1  1233     1
# 2  3232     2
# 3  2334     3
# 4  3330     1
# 5  1445     3
# 6  3455     3
# 7  7632     2
# 8  2411     3
# 9  2281     1

You essentially want to merge the id_avg back to the original data frame by the id column, so you can also use match to follow your original logic:

dat$nums[is.na(dat$nums)] <- id_avg$nums[match(dat$id[is.na(dat$nums)], id_avg$id)]
dat
#        nums id
# 1: 1233.000  1
# 2: 3232.000  2
# 3: 2334.000  3
# 4: 3330.000  1
# 5: 1445.000  3
# 6: 3455.000  3
# 7: 7632.000  2
# 8: 2411.333  3
# 9: 2281.500  1

Upvotes: 1

Zheyuan Li
Zheyuan Li

Reputation: 73415

Or you can fix it by:

dat[is.na(dat$nums),]$nums <- id_avg$nums[temp]

      nums id
1 1233.000  1
2 3232.000  2
3 2334.000  3
4 3330.000  1
5 1445.000  3
6 3455.000  3
7 7632.000  2
8 2411.333  3
9 2281.500  1

Upvotes: 4

shayaa
shayaa

Reputation: 2797

What you want is contained in the zoo package.

library(zoo)
na.aggregate.default(dat, by = dat$id)

      nums id
1 1233.000  1
2 3232.000  2
3 2334.000  3
4 3330.000  1
5 1445.000  3
6 3455.000  3
7 7632.000  2
8 2411.333  3
9 2281.500  1

Upvotes: 3

Related Questions