Reputation: 35
This is probably a very simple question, but I'm having a hard time figuring it out, even after browsing many similar posts.
I have two data frames, dvalues
and svalues
.
dvalues
:
district districtID value state
Badgam 1002 30.2 N/A
Pulwama 1012 10.9 N/A
Kangra 2002 10.2 N/A
Amritsar 3015 29.8 N/A
...
and svalues
:
state stateID
Jammu & Kashmir 1000
Himachal Pradesh 2000
Punjab 3000
....
I want to add a value in the state column for each row, conditional on the districtID
value being between two stateID
values.
For example, since Badgam
and Pulwama
(first two values in File 1) have districtID
values between 1000 and 2000, the state name should be "Jammu & Kashmir"
. Similarly, the district Kangra
with an ID between 2000 and 3000 should have a state name "Himachal Pradesh"
.
What my final result should look like:
district districtID value state
Badgam 1002 30.2 Jammu & Kashmir
Pulwama 1012 10.9 Jammu & Kashmir
Kangra 2002 10.2 Himachal Pradesh
Amritsar 3015 29.8 Punjab
...
Out of many attempts, this is the only one that has worked:
dvalues$state<-
ifelse(dvalues$districtID<2000,"Jammu & Kashmir",
ifelse(dvalues$districtID>2000 & dvalues$districtID<3000,"Himachal Pradesh",
ifelse(dvalues$districtID>3000 & dvalues$districtID<4000,"Punjab",
ifelse(dvalues$districtID>4000 & dvalues$districtID<5000,"Chandigarh",
...
However, this is slow and ugly, not to mention there are 36 lines - I was wondering if there is a more elegant solution.
Thank you, I really appreciate your answers.
Upvotes: 0
Views: 75
Reputation: 12875
If your interval will always remain thousand, then a simple workaround would be -
dvalues$stateID <- dvalues$districtID - (dvalues$districtID %% 1000)
dvalues <- merge(dvalues, svalues, by = 'stateID')
But in general, for these merge-within-interval sort of situations, I use data.table
's roll
argument -
library(data.table)
# converting data.frames to data.tables
svalues <- data.table(svalues)
dvalues <- data.table(dvalues)
# removing state column from dvalues as we will be getting that from svalues
dvalues[,state := NULL]
#setting keys
setkeyv(svalues,'stateID')
setkeyv(dvalues,'districtID')
# merging the data sets based on values in the key columns
# roll = Inf looks for the previous matching value on the key and merges with that
# see help entry for data.table to understand more
svalues[dvalues, roll = +Inf]
Upvotes: 1