Biostatician
Biostatician

Reputation: 111

How to merge two dataframes with conditional statements

I have two df I want to merge.

df1 looks like this:

CODE    TAX   PRICE   TOTAL
4577    0.9   99.1    100
8965    NA    25      25
7788    1.5   13.5    15
4021    NA    20      20
6987    0.00  40      40

df1 is a dataframe containing codes, tax cost depending on prices and the total (sum) of taxes + prices.

df2 looks like this:

CODE    TAX.CLASS   MEANING
4577     CLASS1     Car
4577     CLASS2     Car
8965     CLASS1     Internet
2635     CLASS1     Computer
7788     CLASS1     Phone
7788     CLASS2     Phone
1258     CLASS1     Water
4021     CLASS1     Food
6987     CLASS1     Gasoline
6987     CLASS2     Gasoline

where each code displays a tax.class and meaning. I would like to get an output like this:

CODE    MEANING   TAX.CLASS   TAX   PRICE   TOTAL
4577    Car        CLASS2     0.9   99.1    100
8965    Internet   CLASS1     NA    25      25
7788    Phone      CLASS2     1.5   13.5    15
4021    Food       CLASS1     NA    20      20
6987    Gasoline   CLASS1     0.00  40      40

So far I have tried with the answers of these questions:

but it has not worked as I want. The issue basically is to match every code in df1 with the code in df2 but retrieving CLASS1 if there is NA or a TAX value in df1 <= 0.00, on the other hand, retrieve CLASS2 if there is a TAX value in df1 > 0.00 on codes repeated twice.

Any suggestions?

Preferably BaseR.

Upvotes: 0

Views: 1149

Answers (2)

arg0naut91
arg0naut91

Reputation: 14774

You could do:

df1$TAX.CLASS<- ifelse(is.na(df1$TAX) | df1$TAX <= 0.00, "CLASS1", "CLASS2")

df_final <- merge(df1, df2, by = c("CODE", "TAX.CLASS"), all.x = TRUE)

Output:

   CODE TAX.CLASS TAX PRICE TOTAL  MEANING
1: 4021    CLASS1  NA  20.0    20     Food
2: 4577    CLASS2 0.9  99.1   100      Car
3: 6987    CLASS1 0.0  40.0    40 Gasoline
4: 7788    CLASS2 1.5  13.5    15    Phone
5: 8965    CLASS1  NA  25.0    25 Internet

This is a solution in base, but the same logic can be applied in any package.

Note that you could also use with in the first statement to avoid referencing the data frame:

df1$TAX.CLASS<- with(df1, ifelse(is.na(TAX) | TAX <= 0.00, "CLASS1", "CLASS2"))

Upvotes: 3

iod
iod

Reputation: 7592

Using dplyr, we can create a TAX.CLASS in df1, and then left_join.

df1 %>% dplyr::mutate(TAX.CLASS=ifelse(TAX<=0 | is.na(TAX),"CLASS1","CLASS2")) %>%
  dplyr::left_join(df2, by=c("CODE","TAX.CLASS"))

  CODE TAX PRICE TOTAL TAX.CLASS  MEANING
1 4577 0.9  99.1   100    CLASS2      Car
2 8965  NA  25.0    25    CLASS1 Internet
3 7788 1.5  13.5    15    CLASS2    Phone
4 4021  NA  20.0    20    CLASS1     Food
5 6987 0.0  40.0    40    CLASS1 Gasoline

Upvotes: 1

Related Questions