ADITYA KUMAR PINJALA
ADITYA KUMAR PINJALA

Reputation: 61

How to merge two dataframes based on range value of one table

DF1

SIC     Value
350     100
460     500
140     200
290     400
506     450

DF2

SIC1          AREA 
100-200      Forest
201-280      Hospital
281-350      Education
351-450      Government
451-550      Land

Note:class of SIC1 is having character,we need to convert to numeric range

i am trying to get the output like below

Desired output:

DF3

SIC     Value   AREA
350     100    Education
460     500    Land
140     200    Forest
290     400    Education
506     450    Land

i have tried first to convert character class of SIC1 to numeric then tried to merge,but no luck,can someone guide on this?

Upvotes: 1

Views: 237

Answers (2)

MKR
MKR

Reputation: 20095

An option can be to use tidyr::separate along with sqldf to join both tables on range of values.

library(sqldf)
library(tidyr)

DF2 <- separate(DF2, "SIC1",c("Start","End"), sep = "-")

sqldf("select DF1.*, DF2.AREA from DF1, DF2 
      WHERE DF1.SIC between DF2.Start AND DF2.End")

#   SIC Value      AREA
# 1 350   100 Education
# 2 460   500       Lan
# 3 140   200    Forest
# 4 290   400 Education
# 5 506   450       Lan

Data:

DF1 <- read.table(text =
"SIC     Value
350     100
460     500
140     200
290     400
506     450",
header = TRUE, stringsAsFactors = FALSE)

DF2 <- read.table(text =
"SIC1          AREA
100-200      Forest
201-280      Hospital
281-350      Education
351-450      Government
451-550      Lan",
header = TRUE, stringsAsFactors = FALSE)

Upvotes: 3

akrun
akrun

Reputation: 887148

We could do a non-equi join. Split (tstrsplit) the 'SIC1' column in 'DF2' to numeric columns and then do a non-equi join with the first dataset.

library(data.table)
setDT(DF2)[, c('start', 'end') := tstrsplit(SIC1, '-', type.convert = TRUE)]
DF2[, -1, with = FALSE][DF1, on = .(start <= SIC, end >= SIC), 
        mult = 'last'][, .(SIC = start, Value, AREA)]
#  SIC Value      AREA
#1: 350   100 Education
#2: 460   500      Land
#3: 140   200    Forest
#4: 290   400 Education
#5: 506   450      Land

Or as @Frank mentioned we can do a rolling join to extract the 'AREA' and update it on the first dataset

setDT(DF1)[, AREA := DF2[DF1, on=.(start = SIC), roll=TRUE, x.AREA]]

data

DF1 <- structure(list(SIC = c(350L, 460L, 140L, 290L, 506L), Value = c(100L, 
500L, 200L, 400L, 450L)), .Names = c("SIC", "Value"), 
 class = "data.frame", row.names = c(NA, -5L))

DF2 <- structure(list(SIC1 = c("100-200", "201-280", "281-350", "351-450", 
"451-550"), AREA = c("Forest", "Hospital", "Education", "Government", 
"Land")), .Names = c("SIC1", "AREA"), class = "data.frame",
 row.names = c(NA, -5L))

Upvotes: 3

Related Questions