SUMIT
SUMIT

Reputation: 563

Conditional sum data in R

I have an R data frame with many columns, and I want to sum only columns (header: score) having either cell value >25 or >-25 under row named "Matt". The sum value can be placed after the last column.

input (df1)
Name score score score score score score score
Alex 31 15 18 22 23 23 23
Pat 37 18 29 15 28 28 -28
Matt 33 27 18 88 9 -19 -29
James 12 -36 32 13 21 21 21
output (df2)
Name score score score score score acore score sum
Alex 31 15 18 22 23 23 23 91
Pat 37 18 29 15 28 28 -28 42
Matt 33 27 18 88 9 -19 -29 119
James 12 -36 32 13 21 21 21 10

Any thoughts are more than welcome,

Regards,

Upvotes: 1

Views: 887

Answers (2)

Bruno
Bruno

Reputation: 4150

Here is one way you can solve it in the tidyverse, I use convert it to a longer format so it is easier to deal with it, then we bring it back using a left join

library(tidyverse)

data_example <- tibble::tribble(
    ~Name, ~score1, ~score2, ~score3, ~score4, ~score5, ~score6, ~score7, ~sum,
   "Alex",    31L,    15L,    18L,    22L,    23L,    23L,    23L,  91L,
    "Pat",    37L,    18L,    29L,    15L,    28L,    28L,   -28L,  42L,
   "Matt",    33L,    27L,    18L,    88L,     9L,   -19L,   -29L, 119L,
  "James",    12L,   -36L,    32L,    13L,    21L,    21L,    21L,  10L
  )

valid_condition <- function(x) if_else(x >=25|x<= -25,true = x,false = 0L)

result <- data_example |> 
  pivot_longer(cols = starts_with("score"),names_to = "score_name",values_to = "score_value") |> 
  group_by(score_name) |> 
  filter(score_value[Name == "Matt"] >25|score_value[Name == "Matt"] < -25) |> 
  group_by(Name) |> 
  summarise(final_sum = sum(score_value))

  
data_example |>
  left_join(result)
#> Joining, by = "Name"
#> # A tibble: 4 x 10
#>   Name  score1 score2 score3 score4 score5 score6 score7   sum final_sum
#>   <chr>  <int>  <int>  <int>  <int>  <int>  <int>  <int> <int>     <int>
#> 1 Alex      31     15     18     22     23     23     23    91        91
#> 2 Pat       37     18     29     15     28     28    -28    42        42
#> 3 Matt      33     27     18     88      9    -19    -29   119       119
#> 4 James     12    -36     32     13     21     21     21    10        10

Created on 2021-05-30 by the reprex package (v2.0.0)

Upvotes: 2

akrun
akrun

Reputation: 887118

We create a logical vector based on the 'Name' column ('i1'), then use the OR (| condition on the value 25 and -25 with relational operators (> or < respectively) to create a logical index for the columns. Subset the dataset based on the 'i2', and return the rowSums of those columns and assign it to 'sum' column

i1 <-df1$Name == "Matt" 

i2 <- df1[i1,-1] > 25|df1[i1,-1] < -25
df1$sum <- rowSums(df1[-1][,i2], na.rm = TRUE)

Or using dplyr

library(dplyr)
df1 %>% 
   mutate(Matt = rowSums(select(cur_data(), 
            where(~ is.numeric(.) &&
          (.[Name == 'Matt'] > 25| .[Name == 'Matt'] < -25)))))  

-output

#    Name score score.1 score.2 score.3 score.4 score.5 score.6 Matt
#1  Alex    31      15      18      22      23      23      23   91
#2   Pat    37      18      29      15      28      28     -28   42
#3  Matt    33      27      18      88       9     -19     -29  119
#4 James    12     -36      32      13      21      21      21   10

Upvotes: 3

Related Questions