dplyr's arrange operation does not seem to be commutative within a remote source, when does it matter?

Question

Using the cars dataset (with a postgres connection con_psql):

tmp_cars_sdf <-
    copy_to(con_psql, cars, name = "tmp_cars_sdf", overwrite = T)

We see that the following two sequences of operations, that only interchange the order of filter and arrange, lead to different SQL translations:

tmp_cars_sdf %>%
    filter(speed == 4) %>%
    arrange(dist) %>%
    sql_render

#  SELECT *
#     FROM "tmp_cars_sdf"
# WHERE ("speed" = 4.0)
# ORDER BY "dist"

vs

tmp_cars_sdf %>%
    arrange(dist) %>%
    filter(speed == 4) %>%
    sql_render

#  SELECT *
#     FROM (SELECT *
#               FROM "tmp_cars_sdf"
#           ORDER BY "dist") "dbplyr_006"
# WHERE ("speed" = 4.0)

I am not an expert in SQL but it seems like the ordering is only guaranteed in the former, and not the latter, when collecting results, or using the remote table in further operations:

Is order in a subquery guaranteed to be preserved? https://dba.stackexchange.com/questions/82930/database-implementations-of-order-by-in-a-subquery

What should I do about this?

Edit:

Having done some further investigation I am not sure whether this is important from a data analysis point of view (i.e. treating remote tables just like data frames in ram).

tmp_cars_sdf %>%
    arrange(dist) %>%
    group_by(speed) %>%
    filter(dist > 10) %>%
    mutate(lag_dist = lag(dist)) %>%
    sql_render

#  SELECT "speed", "dist", LAG("dist", 1, NULL) OVER 
# (PARTITION BY "speed" ORDER BY "dist") AS "lag_dist"
# FROM (SELECT *
#           FROM "tmp_cars_sdf"
#       ORDER BY "dist") "dbplyr_014"
# WHERE ("dist" > 10.0)

While it seems that the arrange() seems to be translated into ORDER BY in a subquery, it turns out that this is not important, at least for calculating a new variable based on the order since the ordering information is not 'forgotten'.

dplyr's arrange operation does not seem to be commutative within a remote source, when does it matter?

Edit:

Answers (1)

Related Questions

dplyr&#39;s arrange operation does not seem to be commutative within a remote source, when does it matter?

Edit:

Answers (1)

Related Questions

dplyr's arrange operation does not seem to be commutative within a remote source, when does it matter?