Rezgar
Rezgar

Reputation: 55

classify data into equal size groups

I want to divide my data into different classes, each class with a width of 10

such as:

first data

   variable
    10
    20
    33
    23
    8
    14
    16
    40

new data

variable     classify    group classify
    10       10-20             2
    20       20-30             3
    33       30-40             4
    23       20-30             3
    8        0-10              1
    14       10-20             2
    16       10-20             2
    40       40-50             5

Upvotes: 2

Views: 329

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389325

You can use floor function :

df$group_classify <- floor(df$variable/10) + 1
df$classify <- paste((df$group_classify - 1) * 10, 
                      df$group_classify * 10, sep = '-')
df

#  variable group_classify classify
#1       10              2    10-20
#2       20              3    20-30
#3       33              4    30-40
#4       23              3    20-30
#5        8              1     0-10
#6       14              2    10-20
#7       16              2    10-20
#8       40              5    40-50

data

df <- structure(list(variable = c(10, 20, 33, 23, 8, 14, 16, 40)), 
      class = "data.frame", row.names = c(NA, -8L))

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76673

Use cut with breaks by 10. But be careful with the end-points of the intervals.

brks <- seq(from = min(variable %/% 10) * 10,
            to = (max(variable %/% 10) + 1) * 10, 
            by = 10)
classify <- cut(variable, breaks = brks, include.lowest = TRUE, right = FALSE)
group <- match(classify, levels(classify))

data.frame(variable, classify, group)
#  variable classify group
#1       10  [10,20)     2
#2       20  [20,30)     3
#3       33  [30,40)     4
#4       23  [20,30)     3
#5        8   [0,10)     1
#6       14  [10,20)     2
#7       16  [10,20)     2
#8       40  [40,50]     5

Data

To read in the data as posted, copy&paste to an R session and run:

variable <- scan(text = "
10
20
33
23
8
14
16
40
")

The output of dput(variable) makes it more simple to SO users.

variable <- c(10, 20, 33, 23, 8, 14, 16, 40)

Upvotes: 1

Related Questions