Michelle
Michelle

Reputation: 35

Adding values to a data frame conditional on another data frame

This is probably a very simple question, but I'm having a hard time figuring it out, even after browsing many similar posts.

I have two data frames, dvalues and svalues.

dvalues:

district   districtID   value    state
Badgam     1002          30.2    N/A
Pulwama    1012          10.9    N/A
Kangra     2002          10.2    N/A
Amritsar   3015          29.8    N/A
...

and svalues:

state            stateID
Jammu & Kashmir     1000
Himachal Pradesh    2000
Punjab              3000
....

I want to add a value in the state column for each row, conditional on the districtID value being between two stateID values.

For example, since Badgam and Pulwama (first two values in File 1) have districtID values between 1000 and 2000, the state name should be "Jammu & Kashmir". Similarly, the district Kangra with an ID between 2000 and 3000 should have a state name "Himachal Pradesh".

What my final result should look like:

district   districtID   value    state
Badgam     1002          30.2    Jammu & Kashmir
Pulwama    1012          10.9    Jammu & Kashmir
Kangra     2002          10.2    Himachal Pradesh
Amritsar   3015          29.8    Punjab
...

Out of many attempts, this is the only one that has worked:

dvalues$state<-
 ifelse(dvalues$districtID<2000,"Jammu & Kashmir", 
  ifelse(dvalues$districtID>2000 & dvalues$districtID<3000,"Himachal Pradesh",
    ifelse(dvalues$districtID>3000 & dvalues$districtID<4000,"Punjab",
      ifelse(dvalues$districtID>4000 & dvalues$districtID<5000,"Chandigarh",
       ...

However, this is slow and ugly, not to mention there are 36 lines - I was wondering if there is a more elegant solution.

Thank you, I really appreciate your answers.

Upvotes: 0

Views: 75

Answers (1)

TheComeOnMan
TheComeOnMan

Reputation: 12875

If your interval will always remain thousand, then a simple workaround would be -

dvalues$stateID <- dvalues$districtID - (dvalues$districtID %% 1000)
dvalues <- merge(dvalues, svalues, by = 'stateID')

But in general, for these merge-within-interval sort of situations, I use data.table's roll argument -

library(data.table)

# converting data.frames to data.tables
svalues <- data.table(svalues)
dvalues <- data.table(dvalues)

# removing state column from dvalues as we will be getting that from svalues
dvalues[,state := NULL]

#setting keys
setkeyv(svalues,'stateID')
setkeyv(dvalues,'districtID')

# merging the data sets based on values in the key columns
# roll = Inf looks for the previous matching value on the key and merges with that
# see help entry for data.table to understand more
svalues[dvalues, roll = +Inf]

Upvotes: 1

Related Questions