ivan-k
ivan-k

Reputation: 831

dplyr: error with rowwise mutate with NA

I am getting strange errors with row-wise mutate in dplyr. Here is an example:

set.seed(1)
df <- data.frame(a = rnorm(5), b = rnorm(5))
df[2,'b'] <- NA

There is no trouble with sum, but summary functions are problematic:

mutate(rowwise(df), sum(a, b, na.rm = T)) # works

mutate(rowwise(df), mean(a, b, na.rm = T))
#! Error: missing value where TRUE/FALSE needed
mutate(rowwise(df), median(a, b, na.rm = T))
#! Error: unused argument (-0.820468384118015)

Now, we can try to NA in the first column:

df <- data.frame(a = rnorm(5), b = rnorm(5))
df[2,'a'] <- NA

mutate(rowwise(df), sum(a, b, na.rm = T)) # works

mutate(rowwise(df), mean(a, b, na.rm = T))
#! no error, but returns `NaN`
mutate(rowwise(df), median(a, b, na.rm = T))
#! Error: unused argument (-0.820468384118015)

I am not sure if I am doing something wrong here. I think the expected behavior should be the same as:

as.data.frame(apply(df, 1, mean, na.rm = T)

Thanks!

Upvotes: 1

Views: 2182

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56915

Your error is that you are calling mean and median incorrectly.

While sum can take any number of arguments and will just add them all, mean and median take in only ONE x argument to take the mean/median of.

Just like if a and b were vectors and you wanted the mean of the combined vector you'd use mean(c(a, b)) rather than mean(a,b), you do the same here:

mutate(rowwise(df), mean=mean(c(a, b), na.rm = T), med=median(c(a, b), na.rm=T))

(side note: you are only calculating the mean and median of 2 values at a time here, so the mean equals the median anyway...)

Upvotes: 5

Related Questions