Reputation: 235
Can someone explain why the following dplyr mutate call, in which I apply a function taking one column as an argument to set the value of a new column, doesn't work? It doesn't seem to be calling the function on the correct value: the new season
column is set according to the first value in the mon
column instead of the value in its own row.
# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
if(m >= 11 || m <= 3)
return(as.factor("Winter"))
if(m >= 5 && m <= 9)
return(as.factor("Summer"))
return(as.factor("Trans"))
}
getSeason(5) # Works: returns "Summer"
mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- as.data.frame(mon)
months %>% mutate(season=getSeason(mon)) # doesn't work: all seasons set as "Winter"
I am using R version 3.2.4 and the latest development version of dplyr. (This wasn't working in the latest release of dplyr, either.)
Upvotes: 0
Views: 1042
Reputation: 36076
The other answers nicely explained why you were having the problem.
I think this is a situation where the new function case_when
could come in handy (currently available in the development version, dplyr_0.4.3.9001).
At the moment you have to use dollar sign notation to use case_when
inside mutate
.
months %>% mutate(season = case_when(.$mon >= 11 | .$mon <= 3 ~ "Winter",
.$mon >= 5 & .$mon <= 9 ~ "Summer",
TRUE ~ "Trans"))
mon season
1 1 Winter
2 2 Winter
3 3 Winter
4 4 Trans
5 5 Summer
6 6 Summer
7 7 Summer
8 8 Summer
9 9 Summer
10 10 Trans
11 11 Winter
12 12 Winter
You can build your function using case_when
instead of if
or ifelse
(or the new dplyr function if_else
). To me the syntax seems more similar to using if
than having to nest with ifelse
.
getSeason <- function(m) {
factor(
case_when(
m >= 11 | m <= 3 ~ "Winter",
m >= 5 & m <= 9 ~ "Summer",
TRUE ~ "Trans"
)
)
}
months %>% mutate(season=getSeason(mon))
mon season
1 1 Winter
2 2 Winter
3 3 Winter
4 4 Trans
5 5 Summer
6 6 Summer
7 7 Summer
8 8 Summer
9 9 Summer
10 10 Trans
11 11 Winter
12 12 Winter
Note that the "everything else" condition is done last in case_when
, and you just need to put TRUE
on the left hand side of the formula to fill in everything else with the final value.
Upvotes: 5
Reputation: 43334
if
isn't vectorized (weirdly), so it's only using the first value in mon
, i.e. 1
, so you're getting all Winter
.
To avoid this, use ifelse
, which is vectorized:
months %>% mutate(season = factor(ifelse(mon >= 11 | mon <=3,
'Winter', ifelse(mon >= 5 & mon <= 9,
'Summer', 'Trans'))))
# mon season
# 1 1 Winter
# 2 2 Winter
# 3 3 Winter
# 4 4 Trans
# 5 5 Summer
# 6 6 Summer
# 7 7 Summer
# 8 8 Summer
# 9 9 Summer
# 10 10 Trans
# 11 11 Winter
# 12 12 Winter
If you want to add enough levels that nesting ifelse
s gets nasty, use cut
instead, as you're really turning continuous numeric data into factor data, which is the purpose of cut
.
months %>% mutate(season = droplevels(cut(months$mon, c(0, 3, 4, 9, 10, 12),
c('Winter', 'Trans', 'Summer', 'Trans', 'Winter'))))
Note droplevels
to clean up duplicate levels in this case, which will raise warnings.
Upvotes: 2
Reputation: 14413
You could also use Vectorize
:
# Function to return season (winter, summer, or transition) given numerical month
getSeason <- function(m) {
if(m >= 11 || m <= 3)
return(as.factor("Winter"))
if(m >= 5 && m <= 9)
return(as.factor("Summer"))
return(as.factor("Trans"))
}
getSeason <- Vectorize(getSeason)
mon <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
months <- data.frame(mon = mon)
months %>% mutate(season=gs(mon))
Upvotes: 5