Reputation: 135
I would like to create a new column in my data frame based on intervals in another column "dim"
for example:
my data set is:
df1
id dim
1 25
2 34
3 60
4 65
5 80
6 82
7 90
8 95
9 110
10 120
I would like the follow data set below using the interval by 20 (my column begin with 25 for a new column x
factors: 25:44 = 1 45=64= 2 and so on...
df2
id dim x
1 25 1
2 34 1
3 60 2
4 65 3
5 80 3
6 82 3
7 90 4
8 95 4
9 110 5
10 120 5
someone could help me with that?
Upvotes: 1
Views: 1587
Reputation: 5620
Here is a tidyverse
solution using cut
.
library(tidyverse)
df %>%
mutate(x = cut(dim,
#Add 1 to the maximum value in dim to make sure it is included in the categorization.
breaks = seq(min(dim),max(dim)+1,20),
#Set this to T to include the lowest value
include.lowest = T,
#To set labels as a sequence of integers
labels = F))
# id dim x
#1 1 25 1
#2 2 34 1
#3 3 60 2
#4 4 65 2
#5 5 80 3
#6 6 82 3
#7 7 90 4
#8 8 95 4
#9 9 110 5
#10 10 120 5
Upvotes: 1
Reputation: 887118
We can use %/%
on the difference between the 'dim' and the first
value of 'dim'
library(dplyr)
df %>%
mutate(x = (dim - first(dim)) %/% 20 + 1)
# id dim x
#1 1 25 1
#2 2 34 1
#3 3 60 2
#4 4 65 3
#5 5 80 3
#6 6 82 3
#7 7 90 4
#8 8 95 4
#9 9 110 5
#10 10 120 5
Or an option with findInterval
df %>%
mutate(x = findInterval(dim, seq(20, length.out = n(), by = 20), all.inside = TRUE))
df <- structure(list(id = 1:10, dim = c(25, 34, 60, 65, 80, 82, 90,
95, 110, 120)), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 1
Reputation: 1387
You can do this with floor
and some math:
df <- data.frame(id = 1:10, dim = c(25, 34, 60, 65, 80, 82, 90, 95, 110, 120))
df$x <- floor((df$dim - min(df$dim)) / 20) + 1
# id dim x
#1 1 25 1
#2 2 34 1
#3 3 60 2
#4 4 65 3
#5 5 80 3
#6 6 82 3
#7 7 90 4
#8 8 95 4
#9 9 110 5
#10 10 120 5
Subtract off the smallest entry in df$dim
to start your first category; divide by 20 to get the interval size; take the floor to round down, and add 1 to shift everything up appropriately.
Upvotes: 1