Tom
Tom

Reputation: 2341

Using data.table and cut to split a variable into groups with equal observations

I have a question which is very simple. With the answers to more complicated related questions I have not been able to figure out the right syntax.

I have a data.table which looks as follows:

   dat <- read.table(
  text = "A   B   C   D   E   F   G   H   I   J
  A   0   1   1   1   0   1   0   1   1   1
  B   1   0   0   0   1   0   1   0   0   2
  C   0   0   0   1   1   0   0   0   0   3
  D   1   0   1   0   0   1   0   1   0   4
  E   0   1   0   1   0   1   1   0   1   5
  F   0   0   1   0   0   0   1   0   0   6
  G   0   1   0   1   0   0   0   0   0   7
  H   1   0   1   0   0   1   0   0   0   8
  I   0   1   0   1   1   0   1   0   0   9
  J   1   0   1   0   0   1   0   1   0   9",
  header = TRUE
)

Now I would like to use data.table to create a variable called Jcat to divide variable J into 3 categories with more or less equal amount of observations, simply:

   dat <- read.table(
  text = "A   B   C   D   E   F   G   H   I   J Jcat
  A   0   1   1   1   0   1   0   1   1   1   1
  B   1   0   0   0   1   0   1   0   0   2   1
  C   0   0   0   1   1   0   0   0   0   3   1
  D   1   0   1   0   0   1   0   1   0   4   2
  E   0   1   0   1   0   1   1   0   1   5   2
  F   0   0   1   0   0   0   1   0   0   6   2
  G   0   1   0   1   0   0   0   0   0   7   3
  H   1   0   1   0   0   1   0   0   0   8   3
  I   0   1   0   1   1   0   1   0   0   9   3
  J   1   0   1   0   0   1   0   1   0   9   3",
  header = TRUE
)

I am struggling with the syntax.

What would be the simplest way to do this?

Upvotes: 1

Views: 383

Answers (1)

akrun
akrun

Reputation: 887118

We can specify the number of breaks in breaks argument of cut

library(data.table)
n <- 3
setDT(dat)[, Jcat := as.integer(cut(J, breaks = n))]

Upvotes: 2

Related Questions