altermann
altermann

Reputation: 79

How to sum one column values and group them by intervals from another column

I'm newbie to R and have a data frame with 25k rows and would like to group the sum of "Freq" inputs within a range of "Var1" (let's say from 5 to 5).

Idea is to have less rows and create a histogram.

Here are 20 rows for simplicity:

Var1 <- c(0:19)
Freq <- c(289, 370, 2295, 2691, 2206, 1624, 1267, 1076, 971, 889, 891, 834, 866, 780, 794, 809, 772, 740, 742, 734)

df <- data.frame(Var1, Freq)

Here is what I would expect:

Var1_intervals <- c("0 - 4", "5 - 9", "10 - 14", "15-19")
Freq_sum <- c(7851, 5837, 4165, 3797)

df_2 <- data.frame(Var1_intervals, Freq_sum)

Upvotes: 4

Views: 353

Answers (5)

TarJae
TarJae

Reputation: 78917

Make ID column for your interval (in this case 5), then group and summarise

library(dplyr)
df %>% 
  mutate(Intervals_by5 = rep(row_number(), each=5, length.out = n())) %>% 
  group_by(Intervals_by5) %>% 
  summarise(Freq_sum = sum(Freq)) 

Output:

  Intervals_by5 Freq_sum
          <int>    <dbl>
1             1     7851
2             2     5827
3             3     4165
4             4     3797

Upvotes: 4

akrun
akrun

Reputation: 886998

An option with data.table

library(data.table)
setDT(df)[, .(n = sum(Freq)), by = .(Group = findInterval(Var1, (0:4) * 5))]

Upvotes: 3

Alex
Alex

Reputation: 474

And another solution using dplyr:

df %>%
    group_by(cut(Var1, (0:4)*5, right = FALSE)) %>%
    summarise(n = sum(Freq))

Upvotes: 4

ThomasIsCoding
ThomasIsCoding

Reputation: 101159

Here is another base R option using tapply

with(
    df,
     data.frame(
         Var1 = tapply(Var1,ceiling(seq_along(Var1)/5),function(x) paste0(range(x),collapse = "-")),
         Freq = tapply(Freq,ceiling(seq_along(Freq)/5),sum)              
     )
    )

which gives

   Var1 Freq
1   0-4 7851
2   5-9 5827
3 10-14 4165
4 15-19 3797

Upvotes: 3

GKi
GKi

Reputation: 39647

You can use aggregate and cut to sum up per interval.

aggregate(df["Freq"], list(cut(df$Var1, (0:4)*5, right = FALSE)), sum)
#  Group.1 Freq
#1   [0,5) 7851
#2  [5,10) 5827
#3 [10,15) 4165
#4 [15,20) 3797

Upvotes: 4

Related Questions