Reputation: 4645
I have question about replacing NA
values with numeric values in data.
if all rows are NA
in the group replace them with 100
and else
if there is any numeric value in the group replace NA
s with those numeric values.
Similar posts How to copy value of a cell to other rows based on the value of other two columns?
replace NA value with the group value
However I rather would like to have direct dplyr
solution but those two posts have solution with zoo
package!
df = data.frame(gr=gl(3,3),id=c("NA","NA","NA",131,"NA","NA",232,232,"NA"))
> df
gr id
1 1 NA
2 1 NA
3 1 NA
4 2 131
5 2 NA
6 2 NA
7 3 232
8 3 232
9 3 NA
it looks simple so I tried,
library(dplyr)
df%>%
group_by(gr)%>%
mutate(id_new=ifelse(all(is.na(id)),100,ifelse(any(is.numeric(id)),id[which(is.numeric(id))],NA)))
# A tibble: 9 x 3
# Groups: gr [3]
gr id id_new
<fctr> <fctr> <lgl>
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 2 131 NA
5 2 NA NA
6 2 NA NA
7 3 232 NA
8 3 232 NA
9 3 NA NA
all rows are turn out to be NA
why ?
the expected output
gr id id_new
<fctr> <fctr> <lgl>
1 1 NA 100
2 1 NA 100
3 1 NA 100
4 2 131 131
5 2 NA 131
6 2 NA 131
7 3 232 232
8 3 232 232
9 3 NA 232
Upvotes: 2
Views: 513
Reputation: 3629
Just convert id
as numeric
. Also, for the else condition of the ifelese
, I used max
just in case the value is not unique. Change it to whatever suits you. I don't think there is a need for the complex else statement.
df%>%
group_by(gr)%>%
mutate(id = as.numeric(id)) %>%
mutate(id_new=ifelse(all(is.na(id)),100,max(id, na.rm = TRUE)))
Upvotes: 2
Reputation: 7630
The problem here is your id
column is a factor
, not numeric, so none of the values will return TRUE
with is.numeric()
. This is because you constructed it with "character"
NA
values. Construct it with "numeric"
NA
values like so:
df = data.frame(gr=gl(3,3),id=c(NA, NA,NA,131,NA,NA,232,232,NA))
df %>%
group_by(gr) %>% mutate(id_new=ifelse(all(is.na(id)),100,ifelse(any(is.numeric(id)),id[which(is.numeric(id))],NA)))
# A tibble: 9 x 3
# Groups: gr [3]
gr id id_new
<fct> <dbl> <dbl>
1 1 NA 100
2 1 NA 100
3 1 NA 100
4 2 131 131
5 2 NA 131
6 2 NA 131
7 3 232 232
8 3 232 232
9 3 NA 232
You don't have to do anything special to make the NA
values "numeric"
. c()
will coerce them from "logical"
when you pass them with "numeric"
values. Before, since "character"
has a higher priority, c()
was coercing that column to "character"
when it contained "NA"
instead of NA
, and data.frame()
was converting it to "factor"
because of the default stringsAsFactors = TRUE
Upvotes: 1