Reputation: 1439
I am trying to learn R but just can't quite grasp the syntax. Python is just so much more intuitive for me. I am playing with the nycflights13
toy dataset in R, and what to figure out how to do this simple execution
test = df.groupby('tailnum').agg({'flight':'count','arr_delay':'sum'})
test[(test.flight>=12)].sort_values(by='arr_delay',ascending=False)
and get
flight arr_delay
tailnum
N15910 280 7317.0
N15980 316 7134.0
N16919 251 6904.0
N228JB 388 6778.0
N14998 230 6087.0
... ... ...
N711ZX 291 -2154.0
N722TW 314 -2199.0
N721TW 318 -2285.0
N718TW 328 -2335.0
N727TW 275 -2642.0
I tried the following, but the numbers are off. I'm missing something.
test <- flights %>%
group_by(tailnum) %>%
filter(n()>=12) %>%
summarize(total_delay = sum(arr_delay))
test[order(test$total_delay,decreasing = FALSE),]
and got
tailnum total_delay
<chr> <dbl>
N961UW -1197
N37700 -1148
N3754A -1084
N847VA -1006
...
N179JB 4449
In short, python user is a complete noob with R and trying to get better, plz help
Upvotes: 1
Views: 57
Reputation: 388972
R doesn't drop NA
values by default while taking sum
or mean
, add na.rm. =TRUE
and use arrange
to order the data.
library(nycflights13)
library(dplyr)
flights %>%
group_by(tailnum) %>%
summarise(flight = n(),
arr_delay = sum(arr_delay, na.rm = TRUE)) %>%
filter(flight >= 12) %>%
arrange(desc(arr_delay))
# tailnum flight arr_delay
# <chr> <int> <dbl>
# 1 N15910 280 7317
# 2 N15980 316 7134
# 3 N16919 251 6904
# 4 N228JB 388 6778
# 5 N14998 230 6087
# 6 N192JB 319 5810
# 7 N292JB 322 5804
# 8 N12921 280 5788
# 9 N13958 259 5620
#10 N10575 289 5566
# … with 3,369 more rows
Upvotes: 2