Reputation: 1
I am trying to find out how to efficiently output minimum values of runtime_sec
based on of a subset from hour
column potentially using an anonymous function. Currently, I have a long way of creating a new dataframe, then joining it to the existing dataframe, but would like to do this more efficiently, without having to save out to a new dataframe. I've been looking at how to do this in map (purrr) functions but having a bit of trouble understanding. Apologies in advance if this is confusing, this is my first post on here.
Existing df:
| index | hour | runtime_sec |
|-----: |-----:| -----------:|
| 1 | 6 | 50 |
| 1 | 7 | 100 |
| 1 | 8 | 120 |
| 1 | 9 | 90 |
| 1 | 10 | 100 |
| 1 | 11 | 100 |
| 2 | 10 | 100 |
Current code:
df_min <- df %>%
group_by(index) %>%
filter(hour >= 8 & hour < 10) %>%
summarize(min_ref = min(runtime_sec))
df_join <- df %>%
left_join(df_min, by = "index")
Desired output:
| index | hour | runtime_sec | min_ref |
|----: |----: | ----: | ----: |
| 1 | 6 | 50 | 90 |
| 1 | 7 | 100 | 90 |
| 1 | 8 | 120 | 90 |
| 1 | 9 | 90 | 90 |
| 1 | 10 | 100 | 90 |
| 1 | 11 | 100 | 90 |
| 2 | 10 | 100 | 100 |
Upvotes: -1
Views: 233
Reputation: 160862
dat %>%
group_by(index) %>%
mutate(min_ref = if (any(hour >= 8 & hour < 10)) min(runtime_sec[hour >= 8 & hour < 10]) else NA) %>%
ungroup()
# # A tibble: 7 x 4
# index hour runtime_sec min_ref
# <int> <int> <int> <int>
# 1 1 6 50 90
# 2 1 7 100 90
# 3 1 8 120 90
# 4 1 9 90 90
# 5 1 10 100 90
# 6 1 11 100 90
# 7 2 10 100 NA
Your expectation of min_ref=100
for index==2
is against your rules: the hour
is not < 10
, so there should be no data that meets your condition. If you expect it to match, then you should be using hour <= 10
, in which case one can replace hour >= 8 & hour <= 10
with between(hour, 8, 10)
.
You can reduce the code slightly if you accept that Inf
is a reasonable "minimum" lacking values:
dat %>%
group_by(index) %>%
mutate(min_ref = suppressWarnings(min(runtime_sec[hour >= 8 & hour < 10]))) %>%
ungroup()
# # A tibble: 7 x 4
# index hour runtime_sec min_ref
# <int> <int> <int> <dbl>
# 1 1 6 50 90
# 2 1 7 100 90
# 3 1 8 120 90
# 4 1 9 90 90
# 5 1 10 100 90
# 6 1 11 100 90
# 7 2 10 100 Inf
though this just shortens the code a little.
Upvotes: 2