bg49ag
bg49ag

Reputation: 143

Is there a function to invert the number of occurrences of values in a data.table?

Is there a function that can invert the number of occurrences of a value in a data.table, as opposed to sorting by frequency? E.g. say I have this:

install.packages('data.table')
require(data.table)

initially = data.table(initially = c('a,a','b,b','b,b','c,c','c,c','c,c'))
View(initially)

And wish to produce this:

required.inversion = data.table(required.inversion = c('a,a','a,a','a,a','b,b','b,b', 'c,c'))
View(required.inversion)

The way I was thinking of doing this was to produce a frequency table:

initial.frequencies = initially[, .N ,by = initially]
View(initial.frequencies)

Sort it to ensure it's in ascending frequency order:

initial.frequencies = initial.frequencies[,.SD[order(N)]]
View(initial.frequencies)

Store the order of those initial values:

inversion.key = initial.frequencies$initially
View(inversion.key)

Re-sort the data.table so it's in descending frequency order:

initial.frequencies = initial.frequencies[,.SD[order(N, decreasing = TRUE)]]
View(initial.frequencies)

Then insert the original order back into the table:

 initial.frequencies$inversion.key = inversion.key
 View(initial.frequencies)

I now have a 'key' showing me how many times an initial value would need to be multiplied to invert the number of times it occurs. I.e. that I'd need to multiply the number of times 'a,a' occurs by three, 'b,b' by two and 'c,c' by one.

I'm not sure how to actually replicate the values in the original table and this seems like a bad approach to take as it'll also double the length of the table.

this.approach.would.yield.this.in.the.ram = data.table(this.approach.would.yield.this.in.the.ram = c('a,a','b,b','b,b','c,c','c,c','c,c', 'a,a','a,a','a,a','b,b','b,b', 'c,c'))
View(this.approach.would.yield.this.in.the.ram)

Upvotes: 3

Views: 164

Answers (1)

akrun
akrun

Reputation: 886938

If we use the approach by the OP, then just replicate the rows by the reverse of 'N' and assign 'N' to NULL

initially[, .N, by = initially][rep(seq_len(.N), rev(N))][, N := NULL][]

Upvotes: 2

Related Questions