Nat
Nat

Reputation: 235

dplyr mutate with function call returning incorrect value

Can someone explain why the following dplyr mutate call, in which I apply a function taking one column as an argument to set the value of a new column, doesn't work? It doesn't seem to be calling the function on the correct value: the new season column is set according to the first value in the mon column instead of the value in its own row.

# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
  if(m >= 11 || m <= 3) 
    return(as.factor("Winter"))
  if(m >= 5 && m <= 9) 
    return(as.factor("Summer"))
  return(as.factor("Trans"))
}

getSeason(5) # Works: returns "Summer"

mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- as.data.frame(mon)

months %>% mutate(season=getSeason(mon))  # doesn't work: all seasons set as "Winter"

I am using R version 3.2.4 and the latest development version of dplyr. (This wasn't working in the latest release of dplyr, either.)

Upvotes: 0

Views: 1042

Answers (3)

aosmith
aosmith

Reputation: 36076

The other answers nicely explained why you were having the problem.

I think this is a situation where the new function case_when could come in handy (currently available in the development version, dplyr_0.4.3.9001).

At the moment you have to use dollar sign notation to use case_when inside mutate.

months %>% mutate(season = case_when(.$mon >= 11 | .$mon <= 3 ~ "Winter",
                                     .$mon >= 5 & .$mon <= 9 ~ "Summer",
                                     TRUE ~ "Trans"))

   mon season
1    1 Winter
2    2 Winter
3    3 Winter
4    4  Trans
5    5 Summer
6    6 Summer
7    7 Summer
8    8 Summer
9    9 Summer
10  10  Trans
11  11 Winter
12  12 Winter

You can build your function using case_when instead of if or ifelse (or the new dplyr function if_else). To me the syntax seems more similar to using if than having to nest with ifelse.

getSeason <- function(m) {
    factor(
        case_when(
            m >= 11 | m <= 3 ~ "Winter",
            m >= 5 & m <= 9 ~ "Summer",
            TRUE ~ "Trans"
            ) 
        )
}

months %>% mutate(season=getSeason(mon))

   mon season
1    1 Winter
2    2 Winter
3    3 Winter
4    4  Trans
5    5 Summer
6    6 Summer
7    7 Summer
8    8 Summer
9    9 Summer
10  10  Trans
11  11 Winter
12  12 Winter

Note that the "everything else" condition is done last in case_when, and you just need to put TRUE on the left hand side of the formula to fill in everything else with the final value.

Upvotes: 5

alistaire
alistaire

Reputation: 43334

if isn't vectorized (weirdly), so it's only using the first value in mon, i.e. 1, so you're getting all Winter.

To avoid this, use ifelse, which is vectorized:

months %>% mutate(season = factor(ifelse(mon >= 11 | mon <=3, 
                                         'Winter', ifelse(mon >= 5 & mon <= 9, 
                                                          'Summer', 'Trans'))))
#    mon season
# 1    1 Winter
# 2    2 Winter
# 3    3 Winter
# 4    4  Trans
# 5    5 Summer
# 6    6 Summer
# 7    7 Summer
# 8    8 Summer
# 9    9 Summer
# 10  10  Trans
# 11  11 Winter
# 12  12 Winter

If you want to add enough levels that nesting ifelses gets nasty, use cut instead, as you're really turning continuous numeric data into factor data, which is the purpose of cut.

months %>% mutate(season = droplevels(cut(months$mon, c(0, 3, 4, 9, 10, 12), 
                                          c('Winter', 'Trans', 'Summer', 'Trans', 'Winter'))))

Note droplevels to clean up duplicate levels in this case, which will raise warnings.

Upvotes: 2

johannes
johannes

Reputation: 14413

You could also use Vectorize:

# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
  if(m >= 11 || m <= 3) 
    return(as.factor("Winter"))
  if(m >= 5 && m <= 9) 
    return(as.factor("Summer"))
  return(as.factor("Trans"))
}


getSeason <- Vectorize(getSeason)

mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- data.frame(mon = mon)

months %>% mutate(season=gs(mon)) 

Upvotes: 5

Related Questions