Reputation: 43
I have a tibble I am working with with NAs in a column. I want to replace the NAs (numeric column) with the mean of the column and queue_name column. For example, the NA for row two will be replaced by the mean where queue is "B". NA should be 150 for all queue B's with NA.
call_center_tbl <- tribble(~queue_name, ~talk_time,
"A", 230,
"B", "NA",
"C", 150,
"A", 200,
"B", 100,
"B", 200,
"C", "NA")
I hope this makes sense. Thank you!
Upvotes: 2
Views: 86
Reputation: 23574
One way with data.table is the following. In your data, talk_time
is in character. So I converted the column to numeric. Then, for each queue_name
, I used replace()
in order to replace NA with the average value of each group.
library(data.table)
setDT(call_center_tbl)[, talk_time := as.numeric(talk_time)][,
talk_time := replace(talk_time,
list = which(is.na(talk_time)),
values = mean(talk_time, na.rm = TRUE)),
by = queue_name][]
queue_name talk_time
1: A 230
2: B 150
3: C 150
4: A 200
5: B 100
6: B 200
7: C 150
You can use nafill()
instead of replace()
, if you want.
setDT(call_center_tbl)[, talk_time := as.numeric(talk_time)][,
talk_time := nafill(x = talk_time, fill = mean(talk_time, na.rm = TRUE)),
by = queue_name][]
Upvotes: 3
Reputation: 16178
You can use an ifelse
statement after having grouped your values based on "queue_name":
library(dplyr)
call_center_tbl %>%
mutate(talk_time = as.numeric(talk_time)) %>%
group_by(queue_name) %>%
mutate(talk_time = ifelse(is.na(talk_time), mean(talk_time, na.rm =TRUE), talk_time))
# A tibble: 7 x 2
# Groups: queue_name [3]
queue_name talk_time
<chr> <dbl>
1 A 230
2 B 150
3 C 150
4 A 200
5 B 100
6 B 200
7 C 150
You can also use replace_na
function from tidyr
package instead of ifelse
statement:
library(tidyr)
library(dplyr)
call_center_tbl %>%
mutate(talk_time = as.numeric(talk_time)) %>%
group_by(queue_name) %>%
mutate(talk_time = replace_na(talk_time, mean(talk_time, na.rm = TRUE)))
Upvotes: 2