Anatolye
Anatolye

Reputation: 75

R: Summing of the column values by ranged values of another column

Good day!

I’ve got a table of two columns. In the first column (x) there are values which I want to divide in into categories according to the specified range of values (in my instance – 300). And then using these categories I want to sum values in anther column (v). For instance, using my test data: The first category is from 65100 to 65400 (65100

The result: there is a table of two columns. The first one is the categories of x; the second column is the sum of according values of v.

Thank you!!!

# data
set.seed(1)
x <- sample(seq(65100, 67900, by=5), 100, replace = TRUE)
v <- sample(seq(1000, 8000), 100, replace = TRUE)
tabl <- data.frame(x=c(x), v=c(v))
attach(tabl)
#categories
seq(((min(x) - min(x)%%300) + 300), ((max(x) - max(x)%%300) + 300), by =300)

Upvotes: 1

Views: 72

Answers (3)

Sun Bee
Sun Bee

Reputation: 1820

I understood you want to:

  1. Cut vector x,
  2. Using pre-calculated cut-off thresholds
  3. Compute sums over vector v using those groupings

This is one line of code with data.table and chaining. Your data are in data.table named DT.

DT[, CUT := cut(x, breaks)][, sum(v), by=CUT]

Explanation:

First, assign cut-offs to variable breaks like so.

breaks <- seq(((min(x) - min(x) %% 300) + 300), ((max(x) - max(x) %% 300) + 300), by =300)

Second, compute a new column CUT to group rows by the data in breaks.

DT[, CUT := cut(x, breaks)]

Third, sum on column v in groups, using by=. I have chained this operation with the previous.

DT[, CUT := cut(x, breaks)][, sum(v), by=CUT]

Convert your data.frame to data.table like so.

library(data.table) DT <- as.data.table(tabl)

This is the final result:

                    CUT    V1
 1:  (6.57e+04,6.6e+04] 45493
 2:  (6.6e+04,6.63e+04] 77865
 3: (6.66e+04,6.69e+04] 22893
 4: (6.75e+04,6.78e+04] 61738
 5: (6.54e+04,6.57e+04] 44805
 6: (6.69e+04,6.72e+04] 64079
 7:                  NA 33234
 8: (6.72e+04,6.75e+04] 66517
 9: (6.63e+04,6.66e+04] 43887
10: (6.78e+04,6.81e+04]  172

You can dress this up to improve aesthetics. For example, you can reset the factor levels for ease of reading.

Upvotes: 1

989
989

Reputation: 12937

Try this (no package needed):

s <- seq(65100, max(tabl$x)+300, 300)
tabl$col = as.vector(cut(tabl$x, breaks = s, labels = 1:10))
df <- aggregate(v~col, tabl, sum)

   # col     v
# 1    1 33234
# 2    2 44805
# 3    3 45493
# 4    4 77865
# 5    5 43887
# 6    6 22893
# 7    7 64079
# 8    8 66517
# 9    9 61738
# 10  10  1722

Upvotes: 0

Wietze314
Wietze314

Reputation: 6020

When I use dplyr I am used to do it like this. Although I like the cut solution too.

# data
set.seed(1)
x <- sample(seq(65100, 67900, by=5), 100, replace = TRUE)
v <- sample(seq(1000, 8000), 100, replace = TRUE)
tabl <- data.frame(group=c(x), value=c(v))
attach(tabl)
#categories
s <- seq(((min(x) - min(x)%%300) + 300), ((max(x) - max(x)%%300) + 300), by =300)

 tabl %>% rowwise() %>% mutate(g = s[min(which(group < s), na.rm=T)]) %>% ungroup() %>%
   group_by(g) %>% summarise(sumvalue = sum(value))

result:

 g        sumvalue
 <dbl>    <int>
 65400    28552
 65700    49487
 66000    45493
 66300    77865
 66600    43887
 66900    21187
 67200    65785
 67500    66517
 67800    61738
 68100     1722     

Upvotes: 0

Related Questions