Reputation: 1245
Data is provided at the bottom of the page. I have 2 data frames df1 and df2.
df1:
ticker Price
<chr> <dbl>
SPY 200.00
AAPL 100.00
df2:
ticker expiration strike
<chr> <dbl> <dbl>
SPY 0621 180
SPY 0621 205
SPY 0719 180
SPY 0719 205
AAPL 0621 75
AAPL 0621 105
AAPL 0719 75
AAPL 0719 105
Both data frames have stock data and share a column "ticker". I would like to group df2 by 2 columns and than find the closest strike to the Price column in df1.
The output would look something like this.
df3 = df2 %>% group_by(ticker, expiration)%>% #which[abs(df1$Price - df2$strike) is closest to 0]
output:
ticker expiration strike
<chr> <dbl> <dbl>
SPY 0621 205
SPY 0719 205
AAPL 0621 105
AAPL 0719 105
Here is df1
structure(list(ticker = structure(2:1, .Label = c("AAPL", "SPY"
), class = "factor"), Price = c(200, 100)), class = "data.frame", row.names = c(NA,
-2L))
Here is df2
structure(list(ticker = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L), .Label = c("AAPL", "SPY"), class = "factor"), expiration = c(621,
621, 719, 719, 621, 621, 719, 719), strike = c(180, 205, 180,
205, 75, 100, 75, 100)), class = "data.frame", row.names = c(NA,
-8L))
I am interested in @akrun data.table answer. However I am not getting the full desired output. 0719 for SPY is missing.
library(data.table)
setDT(df2)[, Price := strike][df1, on = .(ticker, Price), roll = -Inf]
ticker expiration strike Price
1: SPY 621 205 200
2: AAPL 621 100 100
3: AAPL 719 100 100
Upvotes: 2
Views: 279
Reputation: 887048
We can use a rolling join after creating the combination with unique
elements of 'expiration' from second dataset
library(data.table)
library(tidyr)
df1N <- crossing(df1, expiration = unique(df2$expiration))
setDT(df2)[, Price := strike][df1N, on = .(ticker, expiration, Price), roll = -Inf]
# ticker expiration strike Price
#1: SPY 621 205 200
#2: SPY 719 205 200
#3: AAPL 621 100 100
#4: AAPL 719 100 100
Or do a full_join
and then slice
based on the min
imum abs
olute difference between the 'Price' and 'strike' column after grouping by 'ticker', 'expiration'
library(dplyr)
full_join(df1, df2) %>%
group_by(ticker, expiration) %>%
slice(which.min(abs(Price - strike)))
# A tibble: 4 x 4
# Groups: ticker, expiration [4]
# ticker Price expiration strike
# <fct> <dbl> <dbl> <dbl>
#1 AAPL 100 621 100
#2 AAPL 100 719 100
#3 SPY 200 621 205
#4 SPY 200 719 205
Upvotes: 2
Reputation: 1456
A tidyverse
answer:
library(tidyverse)
df2 %>%
left_join(df1) %>%
mutate(diff = abs(strike - Price)) %>%
group_by(ticker, expiration) %>%
top_n(-1, wt = diff) %>%
select(-Price, -diff)
Output:
Joining, by = "ticker"
# A tibble: 4 x 3
# Groups: ticker, expiration [4]
ticker expiration strike
<fct> <dbl> <dbl>
1 SPY 621 205
2 SPY 719 205
3 AAPL 621 100
4 AAPL 719 100
Upvotes: 2
Reputation: 1226
Often times I like to use distinct()
to select the smallest or largest values per group (or any other result of arrange()
really). Here I'm first arranging the data by the absolute difference of strike
and Price
. It's very fast compared to group_by()
. By default distinct()
selects the first row per given combination and if we use .keep_all = TRUE
the other columns are kept.
library(dplyr)
df2 %>%
left_join(df1) %>%
arrange(ticker, expiraton, abs(strike - Price)) %>%
distinct(ticker, expiraton, .keep_all = TRUE)
#> Joining, by = "ticker"
#> ticker expiraton strike Price
#> 1 AAPL 621 100 100
#> 2 AAPL 719 100 100
#> 3 SPY 621 205 200
#> 4 SPY 719 205 200
Upvotes: 1