Reputation: 3
Just starting with R (and coding in general)... I have this issue with calculations where I use two different data frames and I couldn't find a solution. I created a simplified example of my problem:
I have two data frames, df_1 and df_2:
df_1:
Numbers Assigned_color
1 -
2 -
3 -
4 -
5 -
6 -
df_2:
Value Color
4 Blue
5 Orange
6 Red
What I want is to do assign the color for numbers in df_1 based on function which uses values from both data frames. In this example I wish to assign a color if sum of df_1$Number and df_2$Value equals 10. This would lead to following outcome (df_1):
Numbers Assigned_color
1 -
2 -
3 -
4 Red
5 Orange
6 Blue
The closest I got is this:
for(i in 2:nrow(df_1)) {
for(j in 2:nrow(df_2)) {
df_1$Assigned_color[i] <- ifelse(df_1$Numbers[i] + df_2$Value[j] == 10,
df_2$Color[j],
df_1$Assigned_color[i])
}}
but is doesn't work, because the result I get is this:
Numbers Assigned_color
1 -
2 -
3 -
4 Red
5 Orange
6 -
... and I don't understand why. Thank you so much for your patience.
EDIT: The real function is much more complex and the dataset is very big. Sorry I should have mentioned this. What I'm actually looking for is script where I could enter any kind of long complicated function and based on value, assign the 'color', something like this:
for(i in 2:nrow(df_1)) {
for(j in 2:nrow(df_2)) {
df_1$Assigned_color[i] <- ifelse
[very long function using values from both dataframes] == [desired value],
df_2$Color[j],
df_1$Assigned_color[i])
}}
Upvotes: 0
Views: 63
Reputation: 1261
You can use dplyr to join the data and get same result;
library(dplyr)
df_1 <-
data.frame(
Numbers = c(1,2,3,4,5,6),
Assigned_color = c(NA_character_,NA_character_,NA_character_,
NA_character_,NA_character_,NA_character_),
stringsAsFactors = FALSE
)
df_2 <-
data.frame(
Value = c(4,5,6),
Color = c("Blue", "Orange", "Red"),
stringsAsFactors = FALSE
)
df_1 %>%
left_join(df_2, by = c("Numbers" = "Value")) %>%
mutate(Value = ifelse(!is.na(Color), Numbers, NA),
Color = ifelse(Value + Numbers == 10, Color, NA_character_)) %>%
select(Numbers, Color)
#Numbers Color
# 1 <NA>
# 2 <NA>
# 3 <NA>
# 4 Blue
# 5 Orange
# 6 Red
Upvotes: 0
Reputation: 887118
A base R
option would be to get the row/column index of an outer sum after converting to a logical matrix ('m1') and then do the assignment based on the index
m1 <- outer(df_2$Value, df_1$Numbers, `+`) == 10
i1 <- which(m1, arr.ind = TRUE)
df_1$Assigned_color[i1[,2]] <- df_2$Color[i1[,1]]
df_1
# Numbers Assigned_color
#1 1 -
#2 2 -
#3 3 -
#4 4 Red
#5 5 Orange
#6 6 Blue
df_1 <- structure(list(Numbers = 1:6, Assigned_color = c("-", "-", "-",
"-", "-", "-")), class = "data.frame", row.names = c(NA, -6L))
df_2 <- structure(list(Value = 4:6, Color = c("Blue", "Orange", "Red"
)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Reputation: 28685
If written as an sql query, you can write the condition that they must sum to 10 as the join condition between the two tables, then do an anti-join of the original table with these results to get the non-matched numbers, and rbind the matches and non-matches together.
library(sqldf)
matches <- sqldf('
select a.Numbers
, b.Color as Assigned_color
from df_1 a
join df_2 b
on a.Numbers + b.Value = 10
')
nonmatches <- sqldf('
select a.*
from df_1 a
left join matches b
on a.Numbers = b.Numbers
where b.Numbers is NULL
')
rbind(nonmatches, matches)
# Numbers Assigned_color
# 1 1 -
# 2 2 -
# 3 3 -
# 4 4 Red
# 5 5 Orange
# 6 6 Blue
Upvotes: 2