Reputation: 4797
I have a dataframe with over 40k rows. This dataset has 2 columns, AccountNumber
and NumberOfContacts
. I created a histogram using the following code:
p <- ggplot() + aes(contactsInfo$NumberOfContacts) + geom_histogram(binwidth=10) + xlim(10,300)+
xlab("Number of contacts") + ylab("Number of accounts")
p
I would now like to create an additional column called 'Bin' to my original dataframe according to the bins.
For example:
If an AccountNumber
has within 0-10 contacts, then the column Bin
should be equal to 1 for that AccountNumber
.
Similarly, if an AccountNumber
has between 50-60 contacts, then Bin
should be equal to 5, and so on...
I can think of a ridiculous ifelse
statement combination which will be extremely lengthy to achieve this task. I was hoping if there's an easier way to achieve this.
Any help would be much appreciated.
Upvotes: 2
Views: 77
Reputation: 12560
I don't know all the details of your dataset, but using mutate
in the dplyr
package:
mutate(contactsInfo, bin = floor(NumberOfContacts / 10))
Upvotes: 3
Reputation: 2000
You can use something like
contactsInfo$binned <- cut(contactsInfo$NumberOfContacts, breaks = seq(0, 100, 10), labels = FALSE)
Upvotes: 0