confusedindividual
confusedindividual

Reputation: 619

Finding certain rows and determining if they are negative or positive

I am trying to do something quite odd. I would like to somehow determine if the row of a separate data.frame is mostly negative or positive based on row numbers in a column of a different data.frame... I have included my example data frames (data.frame1 and data.frame2) and the desired output

>data.frame1
     col_a col_b  col_c
 1   1     1       1
 2   1    -1      -1 
 3   1     1       1 
 4  -1    -1      -1 
 5   1     1      -1

>data.frame2
   col_a
1    2
2    3
3    5


Example output/result 
     col_a     col_b
1      2     negative
2      3     positive
3      5     positive

Another example output could be

Example 2 output/result 
     col_a  col_b
1      2    -1
2      3     1
3      5     1

Dataframe1

datafram2 enter image description here enter image description here

Thank you for all the answers. Follow up to the first question: is there a way to get the output with the measurements from an ID column included? See example data frames and desired output below

 >data.frame1
       ID    col_a col_b  col_c
 1   Fish1   1     1       1
 2   Fish1   1    -1      -1 
 3   Fish1   1     1       1 
 4   Fish1  -1    -1      -1 
 5   Fish1   1     1      -1

>data.frame2
   col_a
1    2
2    3
3    5


Example output/result 
    ID   col_a     col_b
1  Fish1    2     -1
2  Fish1    3      1
3  Fish1    5      1

Upvotes: 2

Views: 365

Answers (3)

jay.sf
jay.sf

Reputation: 73004

Adding 2 to sign of rowMeans and subset a labels vector.

Consider this data frame, where "neutral" outcomes are also possible.

d1
#   col_a col_b col_c col_d
# 1     1     1     1     1
# 2     1    -1    -1     1
# 3     1     1     1     1
# 4    -1    -1    -1    -1
# 5     1     1    -1     1
# 6     1    -1    -1    -1

In d2 I use the complete subset for demonstration.

d2 |> 
  transform(res=c('negative', 'neutral', 'positive')[
    sign(rowMeans(d1[d2$col_a, ])) + 2])
#   col_a      res
# 1     1 positive
# 2     2  neutral
# 3     3 positive
# 4     4 negative
# 5     5 positive
# 6     6 negative

Data:

d1 <- structure(list(col_a = c(1L, 1L, 1L, -1L, 1L, 1L), col_b = c(1L, 
-1L, 1L, -1L, 1L, -1L), col_c = c(1L, -1L, 1L, -1L, -1L, -1L), 
    col_d = c(1L, 1L, 1L, -1L, 1L, -1L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

d2 <- structure(list(col_a = 1:6), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Sandwichnick
Sandwichnick

Reputation: 1466

This is my solution using rowMeans instead of RowSums.

#making the example
df1 <- data.frame("a" = c(1,1,1,-1,1),
                  "b" = c(1,-1,1,-1,1),
                  "c"=  c(1,-1,1,-1,-1))

df2 <- data.frame("col_a" = c(2,3,5))

# Check if all indices of rows in d2 are in df1
stopifnot(all(df2$col_a %in% c(1:nrow(df1))))

# no second sign to get a logical value
df2$col_b <- rowMeans(sign(df1[df2$col_a,])) >= 0

# make to factor
df2$col_b <- ifelse(df2$col_b,"positive","negative")

Upvotes: 2

zx8754
zx8754

Reputation: 56189

Subset first dataframe rows based on second dataframe, then check the sign, then get the sum for the rows, then get sign again:

cbind(data.frame2,
      result = sign(rowSums(sign(data.frame1[ data.frame2$col_a, ]))))
#   col_a result
# 1     2     -1
# 2     3      1
# 3     5      1

Note: if the first dataframe has only values -1 and 1, then we can drop the inner sign step:

sign(rowSums(data.frame1[ data.frame2$col_a, ]))

Upvotes: 2

Related Questions