mpds
mpds

Reputation: 1

How to check if NA within replace() function in R?

In my dataset, the duration of a activity is either given in hours (column duration_hours) or in minutes (column duration_minutes). If it is given in hours, the duration_minutes column is empty (NA) and vice versa.
I now want to convert the values given in minutes into hours by dividing them by 60 (minutes).

To do so I tried this command:

df <- df %>% mutate(duration_recoded = replace(duration_minutes, !is.na(duration_minutes), duration_minutes / 60))

However, the command produces incorrect results and this warning message is shown:

Warning message:
In x[list] <- values :
  number of items to replace is not a multiple of replacement length

Can anybody tell me where my mistake is?

Here's some sample data:

df <- structure(list(duration_hours = c(1, NA, 2, NA, 1), duration_minutes = c(NA, 25, NA, 30, NA)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 139

Answers (2)

Adam Sampson
Adam Sampson

Reputation: 2021

The problem in your code is that duration minutes is a vector and when you divide by 60 you are performing a vector operation. Let's use an example df:

# A tibble: 7 x 1
  duration_minutes
             <dbl>
1               10
2               20
3               30
4               NA
5               50
6               NA
7               60

In this case, df$duraction_minutes / 60 results in:

0.1666667 0.3333333 0.5000000        NA 0.8333333        NA 1.0000000

That means that you are trying to replace every NA value with a vector of multiple values... That is why your warning message says number of items to replace is not a multiple of replacement length.

You either have to use some function that aggregates multiple values to a single value (such as sum(), mean(), first(), etc) or you have to select a single value to act as a replacement. the coalesce() function is just finding the first non-missing element.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

We can make use of the coalesce() function from the dplyr package here:

library(dplyr)
df <- df %>% mutate(duration_recoded = coalesce(duration_hours, duration_minutes / 60))

This should work because if the duration_hours be non NA, then coalesce would simply grab it and assign it to duration_recorded. If duration_hours is actually NA, then it would pass and instead take duration_minutes divided by 60.

Upvotes: 2

Related Questions