Reputation: 3
I am quite new on R so still learning many things. I have been searching around but I cannot find the appropriate answer to my problem. I have these two datasets:
d1
Criteria Order Low High
1 a 1 0 10
2 a 1 11 20
3 a 1 21 30
4 b 1 0 13
5 b 1 14 32
6 a 2 5 22
7 a 2 0 4
8 b 2 0 18
and then d2
Criteria Order Final
1 a 1 13
2 b 2 12
3 a 1 8
4 a 2 2
I was wondering if there is any way to add an extra column to d1
when d2$Final
is within d1$Low
and d1$High
and both Criteria and Order match. What I am expecting to obtain would be something like:
Criteria Order Low High Final
1 a 1 0 10 8
2 a 1 11 20 13
3 a 1 21 30 NA
4 b 1 0 13 NA
5 b 1 14 32 NA
6 a 2 5 22 NA
7 a 2 0 4 2
8 b 2 0 18 12
Or even a numeric output 1/0 for true or false in the Final column would be ok.
Thanks in advance
Upvotes: 0
Views: 51
Reputation: 269371
This uses SQL to create a complex join. The need for [...] around Order
is to distinguish it from the keyword of the same name.
library(sqldf)
sqldf("select d1.*, d2.Final
from d1
left join d2 on d1.Criteria = d2.Criteria and
d1.[Order] = d2.[Order] and
d2.Final between d1.Low and d1.High")
giving the same output shown in the question:
Criteria Order Low High Final
1 a 1 0 10 8
2 a 1 11 20 13
3 a 1 21 30 NA
4 b 1 0 13 NA
5 b 1 14 32 NA
6 a 2 5 22 NA
7 a 2 0 4 2
8 b 2 0 18 12
Data in reproducible form:
Lines1 <- "
Criteria Order Low High
1 a 1 0 10
2 a 1 11 20
3 a 1 21 30
4 b 1 0 13
5 b 1 14 32
6 a 2 5 22
7 a 2 0 4
8 b 2 0 18"
Lines2 <- "
Criteria Order Final
1 a 1 13
2 b 2 12
3 a 1 8
4 a 2 2"
d1 <- read.table(text = Lines1)
d2 <- read.table(text = Lines2)
Upvotes: 2
Reputation: 160407
If your data is "large-ish", this solution will not be for you: the cartesian join will explode beyond what a "standard" computer will tolerate memory-wise.
If, however, your data is small enough (very relative term), then you can do a cartesian-join (also known as a full or full outer join) and filter the results. (This solution is an implementation of one section from https://www.mango-solutions.com/blog/in-between-a-rock-and-a-conditional-join. There are other sections that discuss SQL and fuzzyjoin
, both worthy candidates.)
Three dialects, depending on your preference.
Base R
a <- merge(d2, d1, all.x=T)
a <- transform(a, Final = ifelse(Low <= Final & Final <= High, Final, NA))
a[!duplicated(a),]
# Criteria Order Final Low High
# 1 a 1 NA 0 10
# 2 a 1 13 11 20
# 3 a 1 NA 21 30
# 4 a 1 8 0 10
# 5 a 1 NA 11 20
# 7 a 2 NA 5 22
# 8 a 2 2 0 4
# 9 b 2 12 0 18
It has an extra row, trying to work that elegantly ...
dplyr
library(dplyr)
full_join(d1, d2) %>%
mutate(Final = if_else(between(Final, Low, High), Final, NA_integer_)) %>%
group_by(Criteria, Order, Low, High) %>%
summarise(Final = coalesce(Final)[1]) %>%
ungroup()
# Joining, by = c("Criteria", "Order")
# # A tibble: 8 x 5
# Criteria Order Low High Final
# <chr> <int> <int> <int> <int>
# 1 a 1 0 10 NA
# 2 a 1 11 20 13
# 3 a 1 21 30 NA
# 4 a 2 0 4 2
# 5 a 2 5 22 NA
# 6 b 1 0 13 NA
# 7 b 1 14 32 NA
# 8 b 2 0 18 12
data.table
library(data.table)
as.data.table(d2)[d1, on = .(Final > Low, Final < High, Criteria, Order),
.(Criteria, Order, Low, High, x.Final)]
# Criteria Order Low High x.Final
# 1: a 1 0 10 8
# 2: a 1 11 20 13
# 3: a 1 21 30 NA
# 4: b 1 0 13 NA
# 5: b 1 14 32 NA
# 6: a 2 5 22 NA
# 7: a 2 0 4 2
# 8: b 2 0 18 12
(There is also a solution using data.table::foverlaps
that might be faster or more memory-frugal. Read the link, it's quite helpful.)
Data:
d1 <- structure(list(Criteria = c("a", "a", "a", "b", "b", "a", "a",
"b"), Order = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), Low = c(0L,
11L, 21L, 0L, 14L, 5L, 0L, 0L), High = c(10L, 20L, 30L, 13L,
32L, 22L, 4L, 18L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
d2 <- structure(list(Criteria = c("a", "b", "a", "a"), Order = c(1L,
2L, 1L, 2L), Final = c(13L, 12L, 8L, 2L)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
Upvotes: 1