Reputation: 1355
i choose the hflights-dataset as an example.
I try to create a variable/column that contains the "TailNum" from the planes, but only for the planes that are under the 10% with the longest airtime.
install.packages("hflights")
library("hflights")
flights <-tbl_df(hflights)
flights %>% filter(cume_dist(desc(AirTime)) < 0.1) %>% mutate(new_var=TailNum)
EDIT: The resulting dataframe has only 22208 obs instead of 227496. Is there a way to keep the original dataframe, but add a new variable with the TeilNum for the planes with top10-percent airtime?
Upvotes: 2
Views: 4504
Reputation: 5424
You don't need the flights
in mutate()
after the pipe.
flights %>% filter(cume_dist(desc(AirTime)) < 0.1) %>% mutate(new = TailNum)
Also, new is a function, so best avoid that as a variable name. See ?new. As an illustration:
flights <-tbl_df(hflights)
flights %>% filter(cume_dist(desc(AirTime)) < 0.1) %>%
+ mutate(new_var = TailNum, new = TailNum) %>%
+ select(AirTime, TailNum, new_var)
Source: local data frame [22,208 x 3]
AirTime TailNum new_var
1 255 N614AS N614AS
2 257 N627AS N627AS
3 260 N627AS N627AS
4 268 N618AS N618AS
5 273 N607AS N607AS
6 278 N624AS N624AS
7 274 N611AS N611AS
8 269 N607AS N607AS
9 253 N609AS N609AS
10 315 N626AS N626AS
.. ... ... ...
To retain all observations, lose the filter()
. My normal approach is to use ifelse()
instead. Others may be able to suggest a better solution.
f2 <- flights %>% mutate(cumdist = cume_dist(desc(AirTime)),
new_var = ifelse(cumdist < 0.1, TailNum, NA)) %>%
select(AirTime, TailNum, cumdist, new_var)
table(is.na(f2$new_var))
FALSE TRUE
22208 205288
Upvotes: 4