Maral Dorri
Maral Dorri

Reputation: 478

Group data based on intervals and assign group to new column

I have the following data

> dput(DF)
structure(list(NAME = c("Gait", "Roc", "Bo", "Hernd", 
"Bet", "Oln", "Gai", "Rock", "Mil", "Arli", "Re", "Fred", "Ro", 
"Rock", "Wheat", "Germa", "Rock", "Nort", "Arli", 
"Rockv"), AGE = c(33, 43, 37, 45, 44, 35, 22, 30, 
38, 23, 45, 43, 67, 43, 28, 47, 16, 29, 22, 31)), 
class = "data.frame", row.names = c(NA, -20L))

I want to group the data by specific intervals such that the first group is from AGE 0-19 and the remaining groups are by 10-year intervals so 20-29, 30-39, etc to the max AGE.

Desired output is:

    NAME AGE  GROUP
1   Gait  33  3
2    Roc  43  4
3     Bo  37  3
4  Hernd  45  4
5    Bet  44  4
6    Oln  35  3
7    Gai  22  2
8   Rock  30  3
9    Mil  38  3
10  Arli  23  2
11    Re  45  4
12  Fred  43  4
13    Ro  67  6
14  Rock  43  4
15 Wheat  28  2
16 Germa  47  4
17  Rock  16  1
18  Nort  29  2
19  Arli  22  2
20 Rockv  31  3

Please keep in mind this is just a sample of the data and the actual data is larger. My goal is to have one odd interval for group 1, while the remaining groups are all by the same range of 10 years.

Upvotes: 0

Views: 511

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

You may use cut and create groups based on defined intervals.

transform(DF, GROUP = cut(AGE, c(0, seq(19, max(AGE) + 10, 10)), labels = FALSE))

#    NAME AGE GROUP
#1   Gait  33     3
#2    Roc  43     4
#3     Bo  37     3
#4  Hernd  45     4
#5    Bet  44     4
#6    Oln  35     3
#7    Gai  22     2
#8   Rock  30     3
#9    Mil  38     3
#10  Arli  23     2
#11    Re  45     4
#12  Fred  43     4
#13    Ro  67     6
#14  Rock  43     4
#15 Wheat  28     2
#16 Germa  47     4
#17  Rock  16     1
#18  Nort  29     2
#19  Arli  22     2
#20 Rockv  31     3

The key part here is how we create intervals with c and seq which define the groups.

c(0, seq(19, max(DF$AGE) + 10, 10))
#[1]  0 19 29 39 49 59 69

Upvotes: 2

Related Questions