Alexander
Alexander

Reputation: 4645

Replacing NA values to numeric values in groups

I have question about replacing NA values with numeric values in data. if all rows are NA in the group replace them with 100 and else if there is any numeric value in the group replace NAs with those numeric values.

Similar posts How to copy value of a cell to other rows based on the value of other two columns?

replace NA value with the group value

However I rather would like to have direct dplyr solution but those two posts have solution with zoo package!

df = data.frame(gr=gl(3,3),id=c("NA","NA","NA",131,"NA","NA",232,232,"NA"))

> df
  gr  id
1  1  NA
2  1  NA
3  1  NA
4  2 131
5  2  NA
6  2  NA
7  3 232
8  3 232
9  3  NA

it looks simple so I tried,

library(dplyr)
df%>%
  group_by(gr)%>%
  mutate(id_new=ifelse(all(is.na(id)),100,ifelse(any(is.numeric(id)),id[which(is.numeric(id))],NA)))

# A tibble: 9 x 3
# Groups:   gr [3]
      gr     id id_new
  <fctr> <fctr>  <lgl>
1      1     NA     NA
2      1     NA     NA
3      1     NA     NA
4      2    131     NA
5      2     NA     NA
6      2     NA     NA
7      3    232     NA
8      3    232     NA
9      3     NA     NA

all rows are turn out to be NA why ?

the expected output

      gr     id id_new
  <fctr> <fctr>  <lgl>
1      1     NA     100
2      1     NA     100
3      1     NA     100
4      2    131     131
5      2     NA     131
6      2     NA     131
7      3    232     232
8      3    232     232
9      3     NA     232

Upvotes: 2

Views: 513

Answers (2)

hpesoj626
hpesoj626

Reputation: 3629

Just convert id as numeric. Also, for the else condition of the ifelese, I used max just in case the value is not unique. Change it to whatever suits you. I don't think there is a need for the complex else statement.

df%>%
  group_by(gr)%>%
  mutate(id = as.numeric(id)) %>%
  mutate(id_new=ifelse(all(is.na(id)),100,max(id, na.rm = TRUE)))

Upvotes: 2

De Novo
De Novo

Reputation: 7630

The problem here is your id column is a factor, not numeric, so none of the values will return TRUE with is.numeric(). This is because you constructed it with "character" NA values. Construct it with "numeric" NA values like so:

df = data.frame(gr=gl(3,3),id=c(NA, NA,NA,131,NA,NA,232,232,NA))

df %>%
  group_by(gr) %>% mutate(id_new=ifelse(all(is.na(id)),100,ifelse(any(is.numeric(id)),id[which(is.numeric(id))],NA)))
# A tibble: 9 x 3
# Groups:   gr [3]
  gr       id id_new
  <fct> <dbl>  <dbl>
1 1        NA    100
2 1        NA    100
3 1        NA    100
4 2       131    131
5 2        NA    131
6 2        NA    131
7 3       232    232
8 3       232    232
9 3        NA    232

You don't have to do anything special to make the NA values "numeric". c() will coerce them from "logical" when you pass them with "numeric" values. Before, since "character" has a higher priority, c() was coercing that column to "character" when it contained "NA" instead of NA, and data.frame() was converting it to "factor" because of the default stringsAsFactors = TRUE

Upvotes: 1

Related Questions