transit_desert
transit_desert

Reputation: 1

R: How to use the map function to find min value within a subset of columns

I am trying to find out how to efficiently output minimum values of runtime_sec based on of a subset from hour column potentially using an anonymous function. Currently, I have a long way of creating a new dataframe, then joining it to the existing dataframe, but would like to do this more efficiently, without having to save out to a new dataframe. I've been looking at how to do this in map (purrr) functions but having a bit of trouble understanding. Apologies in advance if this is confusing, this is my first post on here.

Existing df:

| index | hour | runtime_sec |
|-----: |-----:| -----------:|
| 1     | 6    | 50          |
| 1     | 7    | 100         |
| 1     | 8    | 120         |
| 1     | 9    | 90          |
| 1     | 10   | 100         |
| 1     | 11   | 100         |
| 2     | 10   | 100         |

Current code:

df_min <- df %>% 
        group_by(index) %>% 
        filter(hour >= 8 & hour < 10) %>% 
        summarize(min_ref = min(runtime_sec))
df_join <- df %>% 
        left_join(df_min, by = "index")

Desired output:

| index | hour | runtime_sec | min_ref |
|----:  |----: | ----:       | ----:   |
| 1     | 6    | 50          | 90      |
| 1     | 7    | 100         | 90      |
| 1     | 8    | 120         | 90      |
| 1     | 9    | 90          | 90      |
| 1     | 10   | 100         | 90      |
| 1     | 11   | 100         | 90      |
| 2     | 10   | 100         | 100     |

Upvotes: -1

Views: 233

Answers (1)

r2evans
r2evans

Reputation: 160862

dat %>%
  group_by(index) %>%
  mutate(min_ref = if (any(hour >= 8 & hour < 10)) min(runtime_sec[hour >= 8 & hour < 10]) else NA) %>%
  ungroup()
# # A tibble: 7 x 4
#   index  hour runtime_sec min_ref
#   <int> <int>       <int>   <int>
# 1     1     6          50      90
# 2     1     7         100      90
# 3     1     8         120      90
# 4     1     9          90      90
# 5     1    10         100      90
# 6     1    11         100      90
# 7     2    10         100      NA

Your expectation of min_ref=100 for index==2 is against your rules: the hour is not < 10, so there should be no data that meets your condition. If you expect it to match, then you should be using hour <= 10, in which case one can replace hour >= 8 & hour <= 10 with between(hour, 8, 10).

You can reduce the code slightly if you accept that Inf is a reasonable "minimum" lacking values:

dat %>%
  group_by(index) %>%
  mutate(min_ref = suppressWarnings(min(runtime_sec[hour >= 8 & hour < 10]))) %>%
  ungroup()
# # A tibble: 7 x 4
#   index  hour runtime_sec min_ref
#   <int> <int>       <int>   <dbl>
# 1     1     6          50      90
# 2     1     7         100      90
# 3     1     8         120      90
# 4     1     9          90      90
# 5     1    10         100      90
# 6     1    11         100      90
# 7     2    10         100     Inf

though this just shortens the code a little.

Upvotes: 2

Related Questions