BD'auria
BD'auria

Reputation: 135

how to create a new column based on specific interval

I would like to create a new column in my data frame based on intervals in another column "dim"

for example:

my data set is:

df1
id dim
1  25
2  34
3  60
4  65
5  80
6  82
7  90
8  95
9  110
10 120

I would like the follow data set below using the interval by 20 (my column begin with 25 for a new column x
factors: 25:44 = 1 45=64= 2 and so on...
df2
id dim x
1  25  1
2  34  1
3  60  2
4  65  3
5  80  3
6  82  3
7  90  4
8  95  4
9  110 5
10 120 5 

someone could help me with that?

Upvotes: 1

Views: 1587

Answers (3)

Here is a tidyverse solution using cut.

library(tidyverse)
df %>%
  mutate(x = cut(dim, 
                 #Add 1 to the maximum value in dim to make sure it is included in the categorization.
                 breaks = seq(min(dim),max(dim)+1,20),
                 #Set this to T to include the lowest value
                 include.lowest = T,
                 #To set labels as a sequence of integers
                 labels = F))

#   id dim x
#1   1  25 1
#2   2  34 1
#3   3  60 2
#4   4  65 2
#5   5  80 3
#6   6  82 3
#7   7  90 4
#8   8  95 4
#9   9 110 5
#10 10 120 5

Upvotes: 1

akrun
akrun

Reputation: 887118

We can use %/% on the difference between the 'dim' and the first value of 'dim'

library(dplyr)
df %>% 
   mutate(x = (dim - first(dim)) %/% 20 + 1)
#   id dim x
#1   1  25 1
#2   2  34 1
#3   3  60 2
#4   4  65 3
#5   5  80 3
#6   6  82 3
#7   7  90 4
#8   8  95 4
#9   9 110 5
#10 10 120 5

Or an option with findInterval

df %>% 
   mutate(x = findInterval(dim, seq(20, length.out = n(), by = 20), all.inside = TRUE))

data

df <- structure(list(id = 1:10, dim = c(25, 34, 60, 65, 80, 82, 90, 
95, 110, 120)), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 1

Aaron Montgomery
Aaron Montgomery

Reputation: 1387

You can do this with floor and some math:

df <- data.frame(id = 1:10, dim = c(25, 34, 60, 65, 80, 82, 90, 95, 110, 120))
df$x <- floor((df$dim - min(df$dim)) / 20) + 1

#   id dim x
#1   1  25 1
#2   2  34 1
#3   3  60 2
#4   4  65 3
#5   5  80 3
#6   6  82 3
#7   7  90 4
#8   8  95 4
#9   9 110 5
#10 10 120 5

Subtract off the smallest entry in df$dim to start your first category; divide by 20 to get the interval size; take the floor to round down, and add 1 to shift everything up appropriately.

Upvotes: 1

Related Questions