Reputation: 1004
I have data in a wide format containing a Category column listing types of transport and then columns with the name of that type of transport and totals.
I want to create the Calc column where each row is summed across the columns but the value for where the category and column name is the same is excluded.
So for the row total of Car, the sum would be train + bus. The row total of Train would be Car + Bus.
If there is a type of transport in the Category column which isnt listed as a column name, then there should be a NA in the Calc column.
The dataframe is as below, with the Calc column with the results added as expected.
Category<-c("Car","Train","Bus","Bicycle")
Car<-c(9,15,25,5)
Train<-c(8,22,1,7)
Bus<-c(5,2,4,8)
Calc<-c(13, 17,26,NA)
df<-data.frame(Category,Car,Train,Bus,Calc, stringsAsFactors = FALSE)
Can anyone suggest how to add the Calc column as per above? Ideally a vectorised calculation without a loop.
Upvotes: 1
Views: 81
Reputation: 30474
Here is an alternative in base R. You can use apply
row-wise through your data.frame. If the Category
is one of your columns, then calculate the sum
by excluding both the Category
column as well as the column corresponding to the Category
column. Otherwise, use NA
.
df$Calc <- apply(
df,
1,
\(x) {
if (x["Category"] %in% names(x)) {
sum(as.numeric(x[setdiff(names(x), c(x["Category"], "Category"))]))
} else {
NA_integer_
}
}
)
df
Output
Category Car Train Bus Calc
1 Car 9 8 5 13
2 Train 15 22 2 17
3 Bus 25 1 4 26
4 Bicycle 5 7 8 NA
Upvotes: 1
Reputation: 16971
Using rowSums
and a matrix for indexing.
# Example data
Category <- c("Car","Train","Bus","Bicycle")
Car <- c(9,15,25,5)
Train <- c(8,22,1,7)
Bus <- c(5,2,4,8)
df <- data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)
# add the "Calc" column
df$Calc <- rowSums(df[,2:4]) - df[,2:4][matrix(c(1:nrow(df), match(df$Category, colnames(df)[2:4])), ncol = 2)]
df
#> Category Car Train Bus Calc
#> 1 Car 9 8 5 13
#> 2 Train 15 22 2 17
#> 3 Bus 25 1 4 26
#> 4 Bicycle 5 7 8 NA
Upvotes: 1
Reputation: 3184
Here is a tidyverse solution:
df<-data.frame(Category,Car,Train,Bus, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
df |>
pivot_longer(cols = !Category,
names_to = "cat2",
values_to = "value") |>
group_by(Category) |>
mutate(value = case_when((Category %in% cat2) ~ value,
TRUE ~ NA_real_)) |>
filter(cat2 != Category) |>
summarize(Calc = sum(value)) |>
left_join(df)
# A tibble: 4 × 5
Category Calc Car Train Bus
<chr> <dbl> <dbl> <dbl> <dbl>
1 Bicycle NA 5 7 8
2 Bus 26 25 1 4
3 Car 13 9 8 5
4 Train 17 15 22 2
Upvotes: 1