Ds9
Ds9

Reputation: 11

Is there a way to merge two dataframes by a mathematical operation in R?

I a working on a little R project.

Given two data frames with different length:

df1 = data.frame(Plane.Id = c(19924519, 19924321, 19992436, 19924119, 19924208, 19924330), 
                 Block.ID = c(090LC, 090LC, 001UG, 002LM, 001OI, 001UG), 
                 Hour1 = c(0.02222222, 0.02222222, 15.07222, 15.44444, 6.652778, 3.286111))

df2 = data.frame(Block.Id = c(090LC, 001UG, 001UG, 002LM, 001OI), 
                 Sector.ID = c(BIRDFIS, UKOVS, LLLLALL, EBBUEHS, LEBLDDN), 
                 Hour_In = c(0.000000, 0.000000, 13.000000, 0.000000, 0.000000), 
                 Hour_Out = c(23.50000, 13.000000, 23.50000, 23.50000, 23.50000))

Different Sector.ID are assigned to the same Block.ID depending on the hour of the day.

Is it possible to merge them into a single dataframe following the next conditions?:

What I am looking for is a dataframe with the lenght of df1 that contains the data Plane.ID, Block.ID and Sector.ID. Something like this (I do not know how to build a table in here, so I uploaded an image with the table):

df_final

enter image description here

I have tried with rbind, left_join, merge, cbind and nothing good came out. I even tried to do this with a loop but not a good idea.

Upvotes: 0

Views: 147

Answers (2)

Ben
Ben

Reputation: 30474

Here is an alternative solution using data.table:

library(data.table)

setDT(df1)
setDT(df2)

df1[df2, on = .(Block.ID, Hour1 >= Hour_In, Hour1 <= Hour_Out), .(Plane.Id, Block.ID, Sector.ID)]

Output

   Plane.Id Block.ID Sector.ID
1: 19924519    090LC   BIRDFIS
2: 19924321    090LC   BIRDFIS
3: 19924330    001UG     UKOVS
4: 19992436    001UG   LLLLALL
5: 19924119    002LM   EBBUEHS
6: 19924208    001OI   LEBLDDN

Upvotes: 0

Wojciech Kulma
Wojciech Kulma

Reputation: 6446

How about inner joining on the "Block_id" and filtering by the "Hour1" using dplyr?

df1 = 
  data.frame(
    Plane.Id = c(19924519, 19924321, 19992436, 19924119, 19924208, 19924330), 
    Block.ID = c("090LC", "090LC", "001UG", "002LM", "001OI", "001UG"), 
    Hour1    = c(0.02222222, 0.02222222, 15.07222, 15.44444, 6.652778, 3.286111)
  )

df2 = data.frame(
  Block.ID = c("090LC", "001UG", "001UG", "002LM", "001OI"), 
  Sector.ID = c("BIRDFIS", "UKOVS", "LLLLALL", "EBBUEHS", "LEBLDDN"), 
  Hour_In = c(0.000000, 0.000000, 13.000000, 0.000000, 0.000000), 
  Hour_Out = c(23.50000, 13.000000, 23.50000, 23.50000, 23.50000)
)

dplyr::inner_join(df1, df2, by="Block.ID") %>%
dplyr::filter(Hour1 > Hour_In & Hour1 < Hour_Out)

Upvotes: 0

Related Questions