Username
Username

Reputation: 3663

How do I summarise all columns except one(s) I specify?

I want to sum up all but one numerical column in this dataframe.

Group, Registered, Votes, Beans
A,     111,        12,     100
A,     111,        13,     200
A,     111,        14,     300

I want to group this by Group, summing up all the columns except Registered.

summarise_if(
  .tbl = group_by(
    .data = x,
    Precinct
  ),
  .predicate = is.numeric,
  .funs = sum
)

Problem here is the result is a data frame that sums ALL the numeric columns, including Registered. How do I sum all but Registered?

The output I want would look like this

Group, Registered, Votes, Beans
A,     111,        39,    600

Upvotes: 19

Views: 22712

Answers (4)

IceCreamToucan
IceCreamToucan

Reputation: 28685

Edit:

Modified versions of the two methods below for dplyr version >= 1, since summarise_at is superseded

df %>% 
  summarise(across(where(is.numeric) & !Registered, sum))

df %>% 
  summarise(across(-Registered, sum))

Original Answer:

I would use summarise_at, and just make a logical vector which is FALSE for non-numeric columns and Registered and TRUE otherwise, i.e.

df %>% 
  summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)

If you wanted to just summarise all but one column you could do

df %>% 
  summarise_at(vars(-Registered), sum)

but in this case you have to check if it's numeric also.

Notes:

  • factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace sapply(df, is.numeric) with sapply(df, function(x) is.numeric(x) & !is.factor(x))

Upvotes: 19

TassosK
TassosK

Reputation: 293

I needed something similar, so by using the answer of @akrun This answer above I did

df <- as_tibble(df)

df %>%
select(-Type) %>%
summarise_all(sum)

Where "Type" is the non-numeric (char) column for example in the iris data set, Type is the "Species" columns. So I got the sum of all the other columns which happens to be numeric.

Upvotes: 1

akrun
akrun

Reputation: 887118

We can use summarise_if

library(dplyr)
df %>% 
   select(-Registered) %>%
   summarise_if(is.numeric, sum)
#  Votes Beans
#1    39   600

Upvotes: 4

AntoniosK
AntoniosK

Reputation: 16121

dt = read.table(text = "
Group Registered Votes Beans
A     111        12     100
A     111        13     200
A     111        14     300
", header=T)

library(dplyr)

# specify grouping variables
v1 = "Group"
v2 = "Registered"

dt %>%
  group_by_(v1, v2) %>%
  summarise_all(sum) %>%
  ungroup()

# # A tibble: 1 x 4
#     Group Registered Votes Beans
#     <fct>      <int> <int> <int>
#   1 A            111    39   600

Note that I have to assume that within each Group value there's a unique Registered value, so you can group by both variables, instead of grouping only by Group and keeping the unique value of Registered.

Upvotes: 2

Related Questions