Patthebug
Patthebug

Reputation: 4797

Label a dataset according to bins of a histogram

I have a dataframe with over 40k rows. This dataset has 2 columns, AccountNumber and NumberOfContacts. I created a histogram using the following code:

p <- ggplot() + aes(contactsInfo$NumberOfContacts) + geom_histogram(binwidth=10) + xlim(10,300)+
  xlab("Number of contacts") + ylab("Number of accounts")
p

I would now like to create an additional column called 'Bin' to my original dataframe according to the bins.

For example:

If an AccountNumber has within 0-10 contacts, then the column Bin should be equal to 1 for that AccountNumber.

Similarly, if an AccountNumber has between 50-60 contacts, then Bin should be equal to 5, and so on...

I can think of a ridiculous ifelse statement combination which will be extremely lengthy to achieve this task. I was hoping if there's an easier way to achieve this.

Any help would be much appreciated.

Upvotes: 2

Views: 77

Answers (2)

tumultous_rooster
tumultous_rooster

Reputation: 12560

I don't know all the details of your dataset, but using mutate in the dplyr package:

mutate(contactsInfo, bin = floor(NumberOfContacts / 10))

Upvotes: 3

Michele Usuelli
Michele Usuelli

Reputation: 2000

You can use something like

contactsInfo$binned <- cut(contactsInfo$NumberOfContacts, breaks = seq(0, 100, 10), labels = FALSE)

Upvotes: 0

Related Questions