Basil
Basil

Reputation: 1004

Vectorised calculation

I have data in a wide format containing a Category column listing types of transport and then columns with the name of that type of transport and totals.

I want to create the Calc column where each row is summed across the columns but the value for where the category and column name is the same is excluded.

So for the row total of Car, the sum would be train + bus. The row total of Train would be Car + Bus.

If there is a type of transport in the Category column which isnt listed as a column name, then there should be a NA in the Calc column.

The dataframe is as below, with the Calc column with the results added as expected.

Category<-c("Car","Train","Bus","Bicycle")
Car<-c(9,15,25,5)
Train<-c(8,22,1,7)
Bus<-c(5,2,4,8)
Calc<-c(13, 17,26,NA)
df<-data.frame(Category,Car,Train,Bus,Calc, stringsAsFactors = FALSE)

Can anyone suggest how to add the Calc column as per above? Ideally a vectorised calculation without a loop.

Upvotes: 1

Views: 81

Answers (3)

Ben
Ben

Reputation: 30474

Here is an alternative in base R. You can use apply row-wise through your data.frame. If the Category is one of your columns, then calculate the sum by excluding both the Category column as well as the column corresponding to the Category column. Otherwise, use NA.

df$Calc <- apply(
  df,
  1,
  \(x) {
    if (x["Category"] %in% names(x)) {
      sum(as.numeric(x[setdiff(names(x), c(x["Category"], "Category"))]))
    } else {
      NA_integer_
    }
  }
)

df

Output

  Category Car Train Bus Calc
1      Car   9     8   5   13
2    Train  15    22   2   17
3      Bus  25     1   4   26
4  Bicycle   5     7   8   NA

Upvotes: 1

jblood94
jblood94

Reputation: 16971

Using rowSums and a matrix for indexing.

# Example data
Category <- c("Car","Train","Bus","Bicycle")
Car <- c(9,15,25,5)
Train <- c(8,22,1,7)
Bus <- c(5,2,4,8)
df <- data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)

# add the "Calc" column
df$Calc <- rowSums(df[,2:4]) - df[,2:4][matrix(c(1:nrow(df), match(df$Category, colnames(df)[2:4])), ncol = 2)]
df
#>   Category Car Train Bus Calc
#> 1      Car   9     8   5   13
#> 2    Train  15    22   2   17
#> 3      Bus  25     1   4   26
#> 4  Bicycle   5     7   8   NA

Upvotes: 1

Stefano Barbi
Stefano Barbi

Reputation: 3184

Here is a tidyverse solution:

df<-data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)
df |>
    pivot_longer(cols = !Category,
                 names_to = "cat2",
                 values_to = "value") |>
    group_by(Category) |>
    mutate(value = case_when((Category %in% cat2) ~ value,
                             TRUE ~ NA_real_)) |>
    filter(cat2 != Category) |>
    summarize(Calc = sum(value))  |>
    left_join(df)



# A tibble: 4 × 5
  Category  Calc   Car Train   Bus
  <chr>    <dbl> <dbl> <dbl> <dbl>
1 Bicycle     NA     5     7     8
2 Bus         26    25     1     4
3 Car         13     9     8     5
4 Train       17    15    22     2

Upvotes: 1

Related Questions