Sharath
Sharath

Reputation: 2428

Create a range from the available values in R

In the below code cut function is being used and values are been specified but since this is a sample code its just hard coded for few but in my real case scenario we have more than 10 million records so identifying the ranges for the amount variable is quite difficult.

So my question is :

  1. Is it possible to create a range from the values available for the amount in data.frame
  2. For below code the groups are being displayed in exponential how to avoid it even after using scipen=999

options(scipen=999)

id = seq(1:30)
amount = c(30185, 33894, 33642, 29439, 27879 ,52347, 4101, 5425,
            6541, 54589, 5214, 1000, 45000, 64125, 100021, 120000, 
            657412, 15224,4578, 3639, 10000, 48781, 64484, 5020,
            15001, 105050, 14521, 59822, 42871, 32542)

df = data.frame(id, amount)
df$group = cut(df$amount,c(10000, 20000, 30000, 40000, 50000, 60000, 70000))

Output for df

enter image description here

Upvotes: 1

Views: 72

Answers (2)

matteo
matteo

Reputation: 311

You can let the function cut do the work of choosing the cut points by providing a single integer n as input instead of specifying the cut points manually. The function will automatically create n equal length interval.

To adjust the number of digits used in the interval labels, set the optional input dig.lab to the maximum number of digits of your labels.

In your example, you could use the following:

df$group = cut(df$amount,breaks=7, dig.lab=6)

Result:

> df
   id amount             group
1   1  30185 (343.588,94773.1]
2   2  33894 (343.588,94773.1]
3   3  33642 (343.588,94773.1]
4   4  29439 (343.588,94773.1]
5   5  27879 (343.588,94773.1]
6   6  52347 (343.588,94773.1]
7   7   4101 (343.588,94773.1]
8   8   5425 (343.588,94773.1]
9   9   6541 (343.588,94773.1]
10 10  54589 (343.588,94773.1]
11 11   5214 (343.588,94773.1]
...

Edit: To have more regular labels, set the cut points using the seq function. For example:

> df$group = cut(df$amount,breaks=seq(0,700000,25000), dig.lab=6)
> head(df)
  id amount         group
1  1  30185 (25000,50000]
2  2  33894 (25000,50000]
3  3  33642 (25000,50000]
4  4  29439 (25000,50000]
5  5  27879 (25000,50000]
6  6  52347 (50000,75000]

will create cut points at a distance of 25000 one another. Note that you need to specify the min and max of the range (here I set 0 and 700000)

Upvotes: 1

myincas
myincas

Reputation: 1550

cut(x, breaks), breaks either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which x is to be cut.

You can set dig.lab to avoid displaying in exponential.

df$group = cut(df$amount,c(10000, 20000, 30000, 40000, 50000, 60000, 70000), dig.lab = 10)

Upvotes: 0

Related Questions