Finding values in data frame based on another data frame in R

Just starting with R (and coding in general)... I have this issue with calculations where I use two different data frames and I couldn't find a solution. I created a simplified example of my problem:

I have two data frames, df_1 and df_2:

df_1:

  Numbers     Assigned_color
        1                  -
        2                  -
        3                  -
        4                  -
        5                  -
        6                  -

df_2:

  Value     Color
      4      Blue
      5    Orange
      6       Red

What I want is to do assign the color for numbers in df_1 based on function which uses values from both data frames. In this example I wish to assign a color if sum of df_1$Number and df_2$Value equals 10. This would lead to following outcome (df_1):

  Numbers     Assigned_color
        1                  -
        2                  -
        3                  -
        4                Red
        5             Orange
        6               Blue

The closest I got is this:

for(i in 2:nrow(df_1)) {
  for(j in 2:nrow(df_2)) {

       df_1$Assigned_color[i] <- ifelse(df_1$Numbers[i] + df_2$Value[j] == 10,
       df_2$Color[j],
       df_1$Assigned_color[i])

  }}

but is doesn't work, because the result I get is this:

  Numbers     Assigned_color
        1                  -
        2                  -
        3                  -
        4                Red
        5             Orange
        6                  -

... and I don't understand why. Thank you so much for your patience.

EDIT: The real function is much more complex and the dataset is very big. Sorry I should have mentioned this. What I'm actually looking for is script where I could enter any kind of long complicated function and based on value, assign the 'color', something like this:

for(i in 2:nrow(df_1)) {
  for(j in 2:nrow(df_2)) {

       df_1$Assigned_color[i] <- ifelse
       [very long function using values from both dataframes] == [desired value],
       df_2$Color[j],
       df_1$Assigned_color[i])

  }}

Upvotes: 0

Views: 63

Answers (3)

Nareman Darwish
Nareman Darwish

Reputation: 1261

You can use dplyr to join the data and get same result;

library(dplyr)

df_1 <-
  data.frame(
    Numbers = c(1,2,3,4,5,6),
    Assigned_color = c(NA_character_,NA_character_,NA_character_,
                       NA_character_,NA_character_,NA_character_),
    stringsAsFactors = FALSE
  )

df_2 <-
  data.frame(
    Value = c(4,5,6),
    Color = c("Blue", "Orange", "Red"),
    stringsAsFactors = FALSE
  )

df_1 %>%
  left_join(df_2, by = c("Numbers" = "Value")) %>%
  mutate(Value = ifelse(!is.na(Color), Numbers, NA),
         Color = ifelse(Value + Numbers == 10, Color, NA_character_)) %>%
  select(Numbers, Color)

#Numbers  Color
# 1   <NA>
# 2   <NA>
# 3   <NA>
# 4   Blue
# 5 Orange
# 6    Red

Upvotes: 0

akrun
akrun

Reputation: 887118

A base R option would be to get the row/column index of an outer sum after converting to a logical matrix ('m1') and then do the assignment based on the index

m1 <- outer(df_2$Value, df_1$Numbers, `+`) == 10
i1 <- which(m1, arr.ind = TRUE)
df_1$Assigned_color[i1[,2]] <- df_2$Color[i1[,1]]
df_1
#  Numbers Assigned_color
#1       1              -
#2       2              -
#3       3              -
#4       4            Red
#5       5         Orange
#6       6           Blue

data

df_1 <- structure(list(Numbers = 1:6, Assigned_color = c("-", "-", "-", 
"-", "-", "-")), class = "data.frame", row.names = c(NA, -6L))

df_2 <- structure(list(Value = 4:6, Color = c("Blue", "Orange", "Red"
)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

IceCreamToucan
IceCreamToucan

Reputation: 28685

If written as an sql query, you can write the condition that they must sum to 10 as the join condition between the two tables, then do an anti-join of the original table with these results to get the non-matched numbers, and rbind the matches and non-matches together.

library(sqldf)

matches <-  sqldf('
  select  a.Numbers
          , b.Color as Assigned_color
  from    df_1 a
          join df_2 b
            on a.Numbers + b.Value = 10
  ')
nonmatches <-  sqldf('
  select  a.*
  from    df_1 a 
          left join matches b
            on a.Numbers = b.Numbers
  where   b.Numbers is NULL
  ')

rbind(nonmatches, matches)
#   Numbers Assigned_color
# 1       1              -
# 2       2              -
# 3       3              -
# 4       4            Red
# 5       5         Orange
# 6       6           Blue

Upvotes: 2

Related Questions