Reputation: 619
I am trying to do something quite odd. I would like to somehow determine if the row of a separate data.frame is mostly negative or positive based on row numbers in a column of a different data.frame... I have included my example data frames (data.frame1 and data.frame2) and the desired output
>data.frame1
col_a col_b col_c
1 1 1 1
2 1 -1 -1
3 1 1 1
4 -1 -1 -1
5 1 1 -1
>data.frame2
col_a
1 2
2 3
3 5
Example output/result
col_a col_b
1 2 negative
2 3 positive
3 5 positive
Another example output could be
Example 2 output/result
col_a col_b
1 2 -1
2 3 1
3 5 1
Thank you for all the answers. Follow up to the first question: is there a way to get the output with the measurements from an ID column included? See example data frames and desired output below
>data.frame1
ID col_a col_b col_c
1 Fish1 1 1 1
2 Fish1 1 -1 -1
3 Fish1 1 1 1
4 Fish1 -1 -1 -1
5 Fish1 1 1 -1
>data.frame2
col_a
1 2
2 3
3 5
Example output/result
ID col_a col_b
1 Fish1 2 -1
2 Fish1 3 1
3 Fish1 5 1
Upvotes: 2
Views: 365
Reputation: 73004
Adding 2
to sign
of rowMeans
and subset a labels vector.
Consider this data frame, where "neutral" outcomes are also possible.
d1
# col_a col_b col_c col_d
# 1 1 1 1 1
# 2 1 -1 -1 1
# 3 1 1 1 1
# 4 -1 -1 -1 -1
# 5 1 1 -1 1
# 6 1 -1 -1 -1
In d2
I use the complete subset for demonstration.
d2 |>
transform(res=c('negative', 'neutral', 'positive')[
sign(rowMeans(d1[d2$col_a, ])) + 2])
# col_a res
# 1 1 positive
# 2 2 neutral
# 3 3 positive
# 4 4 negative
# 5 5 positive
# 6 6 negative
Data:
d1 <- structure(list(col_a = c(1L, 1L, 1L, -1L, 1L, 1L), col_b = c(1L,
-1L, 1L, -1L, 1L, -1L), col_c = c(1L, -1L, 1L, -1L, -1L, -1L),
col_d = c(1L, 1L, 1L, -1L, 1L, -1L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
d2 <- structure(list(col_a = 1:6), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1
Reputation: 1466
This is my solution using rowMeans
instead of RowSums
.
#making the example
df1 <- data.frame("a" = c(1,1,1,-1,1),
"b" = c(1,-1,1,-1,1),
"c"= c(1,-1,1,-1,-1))
df2 <- data.frame("col_a" = c(2,3,5))
# Check if all indices of rows in d2 are in df1
stopifnot(all(df2$col_a %in% c(1:nrow(df1))))
# no second sign to get a logical value
df2$col_b <- rowMeans(sign(df1[df2$col_a,])) >= 0
# make to factor
df2$col_b <- ifelse(df2$col_b,"positive","negative")
Upvotes: 2
Reputation: 56189
Subset first dataframe rows based on second dataframe, then check the sign, then get the sum for the rows, then get sign again:
cbind(data.frame2,
result = sign(rowSums(sign(data.frame1[ data.frame2$col_a, ]))))
# col_a result
# 1 2 -1
# 2 3 1
# 3 5 1
Note: if the first dataframe has only values -1 and 1, then we can drop the inner sign step:
sign(rowSums(data.frame1[ data.frame2$col_a, ]))
Upvotes: 2