Jordan Wrong
Jordan Wrong

Reputation: 1245

Group By and Locate the Closest Number

Data is provided at the bottom of the page. I have 2 data frames df1 and df2.

df1:
ticker   Price
<chr>    <dbl>
SPY      200.00
AAPL     100.00

df2:
ticker  expiration   strike
<chr>    <dbl>       <dbl>
SPY      0621         180
SPY      0621         205
SPY      0719         180
SPY      0719         205
AAPL     0621          75
AAPL     0621         105
AAPL     0719          75
AAPL     0719         105

Both data frames have stock data and share a column "ticker". I would like to group df2 by 2 columns and than find the closest strike to the Price column in df1.

The output would look something like this.

df3 = df2 %>% group_by(ticker, expiration)%>% #which[abs(df1$Price - df2$strike) is closest to 0]

output:
ticker   expiration  strike
<chr>     <dbl>       <dbl>
SPY       0621         205
SPY       0719         205
AAPL      0621         105
AAPL      0719         105

Here is df1

structure(list(ticker = structure(2:1, .Label = c("AAPL", "SPY"
), class = "factor"), Price = c(200, 100)), class = "data.frame", row.names = c(NA, 
-2L))

Here is df2

structure(list(ticker = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("AAPL", "SPY"), class = "factor"), expiration = c(621, 
621, 719, 719, 621, 621, 719, 719), strike = c(180, 205, 180, 
205, 75, 100, 75, 100)), class = "data.frame", row.names = c(NA, 
-8L))

I am interested in @akrun data.table answer. However I am not getting the full desired output. 0719 for SPY is missing.

library(data.table)
setDT(df2)[, Price := strike][df1, on = .(ticker, Price), roll = -Inf]
ticker expiration strike Price
1:    SPY        621    205   200
2:   AAPL        621    100   100
3:   AAPL        719    100   100

Upvotes: 2

Views: 279

Answers (3)

akrun
akrun

Reputation: 887048

We can use a rolling join after creating the combination with unique elements of 'expiration' from second dataset

library(data.table)
library(tidyr)
df1N <- crossing(df1, expiration = unique(df2$expiration))
setDT(df2)[, Price := strike][df1N, on = .(ticker, expiration, Price), roll = -Inf]
#    ticker expiration strike Price
#1:    SPY        621    205   200
#2:    SPY        719    205   200
#3:   AAPL        621    100   100
#4:   AAPL        719    100   100

Or do a full_join and then slice based on the minimum absolute difference between the 'Price' and 'strike' column after grouping by 'ticker', 'expiration'

library(dplyr)
full_join(df1, df2) %>% 
    group_by(ticker, expiration) %>% 
    slice(which.min(abs(Price - strike)))
# A tibble: 4 x 4
# Groups:   ticker, expiration [4]
#  ticker Price expiration strike
#  <fct>  <dbl>      <dbl>  <dbl>
#1 AAPL     100        621    100
#2 AAPL     100        719    100
#3 SPY      200        621    205
#4 SPY      200        719    205

Upvotes: 2

Marian Minar
Marian Minar

Reputation: 1456

A tidyverse answer:

library(tidyverse)

df2 %>% 
  left_join(df1) %>%
  mutate(diff = abs(strike - Price)) %>%
  group_by(ticker, expiration) %>%
  top_n(-1, wt = diff) %>%
  select(-Price, -diff)

Output:

Joining, by = "ticker"
# A tibble: 4 x 3
# Groups:   ticker, expiration [4]
  ticker expiration strike
  <fct>       <dbl>  <dbl>
1 SPY           621    205
2 SPY           719    205
3 AAPL          621    100
4 AAPL          719    100

Upvotes: 2

pasipasi
pasipasi

Reputation: 1226

Often times I like to use distinct() to select the smallest or largest values per group (or any other result of arrange() really). Here I'm first arranging the data by the absolute difference of strike and Price. It's very fast compared to group_by(). By default distinct() selects the first row per given combination and if we use .keep_all = TRUE the other columns are kept.

library(dplyr)

df2 %>% 
  left_join(df1) %>% 
  arrange(ticker, expiraton, abs(strike - Price)) %>% 
  distinct(ticker, expiraton, .keep_all = TRUE)
#> Joining, by = "ticker"
#>   ticker expiraton strike Price
#> 1   AAPL       621    100   100
#> 2   AAPL       719    100   100
#> 3    SPY       621    205   200
#> 4    SPY       719    205   200

Upvotes: 1

Related Questions