Ninke
Ninke

Reputation: 257

Why does sum() not produce the same values as addition when applied to a dataframe?

Out of curiosity: I thought it was possible to use sum() to create a new variable in an R dataframe- the objective being calculating an overall score out of several single values. However sum() apparently sums all the values in a column and not just the values of a single case. What is the mechanism behind this and is there a function that adds the values as the simple addition does?

Daten <- data.frame(
  cases = c("first", "second", "third"), 
  values1= c(1,2,3),
  values2= c(27,19,34),
  values3= c(2,8,7)
)




Daten$valcomb = sum(Daten$values1,Daten$values2,Daten$values3)

Daten$valcomb2 = Daten$values1+Daten$values2+Daten$values3

print(Daten)

Output

   cases values1 values2 values3 valcomb valcomb2
1  first       1      27       2     103       30
2 second       2      19       8     103       29
3  third       3      34       7     103       44

Upvotes: 1

Views: 858

Answers (2)

Merijn van Tilborg
Merijn van Tilborg

Reputation: 5897

This has nothing to do with data.frames but about the behaviour of the + as operator versus the behaviour of sum() as a function.

Using the + operator on vectors it will sum each vector element

c(1,2,3) + c(27,19,34) + c(2,8,7)
# [1] 30 29 44

Be aware though as it recycles the length if unequal in length where possible or only throwing a warning.

c(1,2,3,4) + c(27,19,34) + c(2,8,7)
# [1] 30 29 44 33
# Warning messages:
# 1: In c(1, 2, 3, 4) + c(27, 19, 34) :
#   longer object length is not a multiple of shorter object length
# 2: In c(1, 2, 3, 4) + c(27, 19, 34) + c(2, 8, 7) :
#   longer object length is not a multiple of shorter object length

While using the sum() function on vectors it will sum all values by definition of the function.

sum(c(1,2,3), c(27,19,34), c(2,8,7))
# [1] 103

Update

Besides some theory on + operator and sum function behaviour, here an example to use some functions to work with data.frame data.

library(dplyr)

Daten %>%
  # sums all columns that yield numeric values
  mutate(sum_all = rowSums(across(where(is.numeric)))) %>%
  # sums values1 and values3
  mutate(sum_1_3 = rowSums(across(c("values1", "values3"))))

#    cases values1 values2 values3 sum_all sum_1_3
# 1  first       1      27       2      30       3
# 2 second       2      19       8      29      10
# 3  third       3      34       7      44      10

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174586

If you are summing rows, you need to userowSums rather than sum. Obviously, you can't include the non-numeric cases column, so you need rowSums(Daten[-1]) to get the sums across the numeric rows.

within(Daten, sums <- rowSums(Daten[-1]))
#>    cases values1 values2 values3 sums
#> 1  first       1      27       2   30
#> 2 second       2      19       8   29
#> 3  third       3      34       7   44

Or, if you are using dplyr:

Daten %>% mutate(sums = rowSums(.[-1]))
#>    cases values1 values2 values3 sums
#> 1  first       1      27       2   30
#> 2 second       2      19       8   29
#> 3  third       3      34       7   44

Upvotes: 2

Related Questions