72o
72o

Reputation: 43

Is there a function in R that allows be to replace NA with mean of row criteria?

I have a tibble I am working with with NAs in a column. I want to replace the NAs (numeric column) with the mean of the column and queue_name column. For example, the NA for row two will be replaced by the mean where queue is "B". NA should be 150 for all queue B's with NA.

call_center_tbl <- tribble(~queue_name, ~talk_time,
        "A", 230,
        "B", "NA",
        "C", 150,
        "A", 200,
        "B", 100,
        "B", 200,
        "C", "NA")

I hope this makes sense. Thank you!

Upvotes: 2

Views: 86

Answers (2)

jazzurro
jazzurro

Reputation: 23574

One way with data.table is the following. In your data, talk_time is in character. So I converted the column to numeric. Then, for each queue_name, I used replace() in order to replace NA with the average value of each group.

library(data.table)

setDT(call_center_tbl)[, talk_time := as.numeric(talk_time)][,
   talk_time :=  replace(talk_time,
                         list = which(is.na(talk_time)),
                         values = mean(talk_time, na.rm = TRUE)),
   by = queue_name][]

   queue_name talk_time
1:          A       230
2:          B       150
3:          C       150
4:          A       200
5:          B       100
6:          B       200
7:          C       150

You can use nafill() instead of replace(), if you want.

setDT(call_center_tbl)[, talk_time := as.numeric(talk_time)][,
                          talk_time := nafill(x = talk_time, fill = mean(talk_time, na.rm = TRUE)),
                                  by = queue_name][]

Upvotes: 3

dc37
dc37

Reputation: 16178

You can use an ifelse statement after having grouped your values based on "queue_name":

library(dplyr)
call_center_tbl %>% 
  mutate(talk_time = as.numeric(talk_time)) %>% 
  group_by(queue_name) %>% 
  mutate(talk_time = ifelse(is.na(talk_time), mean(talk_time, na.rm =TRUE), talk_time))

# A tibble: 7 x 2
# Groups:   queue_name [3]
  queue_name talk_time
  <chr>          <dbl>
1 A                230
2 B                150
3 C                150
4 A                200
5 B                100
6 B                200
7 C                150

You can also use replace_na function from tidyr package instead of ifelse statement:

library(tidyr)
library(dplyr)
call_center_tbl %>% 
  mutate(talk_time = as.numeric(talk_time)) %>% 
  group_by(queue_name) %>% 
  mutate(talk_time = replace_na(talk_time, mean(talk_time, na.rm = TRUE)))

Upvotes: 2

Related Questions