hanzgs
hanzgs

Reputation: 1616

How to filter one data frame based on another dataframe cell values present as a condition?

I have First data frame as

feature  feature_weight    feature_desc
A        0.046891755        -0.831 < A      
B        -0.036292305       B <= -0.243     
C        0.008370983        0.317 < C       
D        0.007841638        0.212 < D

Second Data Frame as

variable   binning     percent
A          ar1         -0.224
A          ar2         0.715
A          ar3         -0.831
B          br1         -0.243
B          br2         -0.016
B          br3         0.128
C          cr1         0.102
C          cr2         0.317
C          cr3         -0.022
D          dr1         -0.522
D          dr2         0.212
D          dr3         -0.589

I need the binning values in the Second data frame to be added as new column in First data frame, but only those specific values filtered out from the character data type of feature_desc column, as below

feature  feature_weight    feature_desc   binning
A        0.046891755        -0.831 < A    ar3       
B        -0.036292305       B <= -0.243   br1   
C        0.008370983        0.317 < C     cr2   
D        0.007841638        0.212 < D     dr2

The feature_desc is a character type

Upvotes: 1

Views: 56

Answers (2)

hanzgs
hanzgs

Reputation: 1616

I filtered the numerics as new_values

gsub("[^0-9\\.\\-]", "", df1$feature_desc)

Then removed the duplicates of variable in df2 based on value from df1

df3 %>%  filter(percent %in% df1$new_values)

then merged both

merge(df3, df4, by.x='variable', by.y='feature')

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388797

Once you filter out the numerics, we can merge the two dataframes to get binning value.

df1$value <- gsub("[^0-9\\.\\-]", "", df1$feature_desc) 

merge(df1, df2, by.x = c("feature", "value"), by.y = c("variable", "percent"))


#  feature  value feature_weight feature_desc binning
#1       A -0.831    0.046891755   -0.831 < A     ar3
#2       B -0.243   -0.036292305  B <= -0.243     br1
#3       C  0.317    0.008370983    0.317 < C     cr2
#4       D  0.212    0.007841638    0.212 < D     dr2

You can delete the value column later if not needed.

Upvotes: 1

Related Questions