Reputation: 563
I have an R data frame with many columns, and I want to sum only columns (header: score) having either cell value >25 or >-25 under row named "Matt". The sum value can be placed after the last column.
input (df1)
Name | score | score | score | score | score | score | score |
---|---|---|---|---|---|---|---|
Alex | 31 | 15 | 18 | 22 | 23 | 23 | 23 |
Pat | 37 | 18 | 29 | 15 | 28 | 28 | -28 |
Matt | 33 | 27 | 18 | 88 | 9 | -19 | -29 |
James | 12 | -36 | 32 | 13 | 21 | 21 | 21 |
output (df2)
Name | score | score | score | score | score | acore | score | sum |
---|---|---|---|---|---|---|---|---|
Alex | 31 | 15 | 18 | 22 | 23 | 23 | 23 | 91 |
Pat | 37 | 18 | 29 | 15 | 28 | 28 | -28 | 42 |
Matt | 33 | 27 | 18 | 88 | 9 | -19 | -29 | 119 |
James | 12 | -36 | 32 | 13 | 21 | 21 | 21 | 10 |
Any thoughts are more than welcome,
Regards,
Upvotes: 1
Views: 887
Reputation: 4150
Here is one way you can solve it in the tidyverse, I use convert it to a longer format so it is easier to deal with it, then we bring it back using a left join
library(tidyverse)
data_example <- tibble::tribble(
~Name, ~score1, ~score2, ~score3, ~score4, ~score5, ~score6, ~score7, ~sum,
"Alex", 31L, 15L, 18L, 22L, 23L, 23L, 23L, 91L,
"Pat", 37L, 18L, 29L, 15L, 28L, 28L, -28L, 42L,
"Matt", 33L, 27L, 18L, 88L, 9L, -19L, -29L, 119L,
"James", 12L, -36L, 32L, 13L, 21L, 21L, 21L, 10L
)
valid_condition <- function(x) if_else(x >=25|x<= -25,true = x,false = 0L)
result <- data_example |>
pivot_longer(cols = starts_with("score"),names_to = "score_name",values_to = "score_value") |>
group_by(score_name) |>
filter(score_value[Name == "Matt"] >25|score_value[Name == "Matt"] < -25) |>
group_by(Name) |>
summarise(final_sum = sum(score_value))
data_example |>
left_join(result)
#> Joining, by = "Name"
#> # A tibble: 4 x 10
#> Name score1 score2 score3 score4 score5 score6 score7 sum final_sum
#> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 Alex 31 15 18 22 23 23 23 91 91
#> 2 Pat 37 18 29 15 28 28 -28 42 42
#> 3 Matt 33 27 18 88 9 -19 -29 119 119
#> 4 James 12 -36 32 13 21 21 21 10 10
Created on 2021-05-30 by the reprex package (v2.0.0)
Upvotes: 2
Reputation: 887118
We create a logical vector based on the 'Name' column ('i1'), then use the OR (|
condition on the value 25 and -25 with relational operators (>
or <
respectively) to create a logical index for the columns. Subset the dataset based on the 'i2', and return the rowSums
of those columns and assign it to 'sum' column
i1 <-df1$Name == "Matt"
i2 <- df1[i1,-1] > 25|df1[i1,-1] < -25
df1$sum <- rowSums(df1[-1][,i2], na.rm = TRUE)
Or using dplyr
library(dplyr)
df1 %>%
mutate(Matt = rowSums(select(cur_data(),
where(~ is.numeric(.) &&
(.[Name == 'Matt'] > 25| .[Name == 'Matt'] < -25)))))
-output
# Name score score.1 score.2 score.3 score.4 score.5 score.6 Matt
#1 Alex 31 15 18 22 23 23 23 91
#2 Pat 37 18 29 15 28 28 -28 42
#3 Matt 33 27 18 88 9 -19 -29 119
#4 James 12 -36 32 13 21 21 21 10
Upvotes: 3