Reputation: 324
Let's say we have the following data.table
dt = data.table(a=letters[1:20], b = c(rep(1,3),rep(2,7),rep(3,5),rep(4,5)))
that is
a b
1: a 1
2: b 1
3: c 1
4: d 2
5: e 2
6: f 2
7: g 2
8: h 2
9: i 2
10: j 2
11: k 3
12: l 3
13: m 3
14: n 3
15: o 3
16: p 4
17: q 4
18: r 4
19: s 4
20: t 4
and that I want to assign a rank from 0 to 1 to each row but grouping by column b. I'm doing
dt[,len:=.N,by=b][,rank:=c(0:(len-1))/(len-1),by=b][,len:=NULL]
where len
is there just to calculate the rank and then is removed. I obtain
a b rank
1: a 1 0.0000000
2: b 1 0.5000000
3: c 1 1.0000000
4: d 2 0.0000000
5: e 2 0.1666667
6: f 2 0.3333333
7: g 2 0.5000000
8: h 2 0.6666667
9: i 2 0.8333333
10: j 2 1.0000000
11: k 3 0.0000000
12: l 3 0.2500000
13: m 3 0.5000000
14: n 3 0.7500000
15: o 3 1.0000000
16: p 4 0.0000000
17: q 4 0.2500000
18: r 4 0.5000000
19: s 4 0.7500000
20: t 4 1.0000000
which is exactly what i want. The problem is that I get also this
Warning messages:
1: In base::":"(from, to) :
numerical expression has 3 elements: only the first used
2: In base::":"(from, to) :
numerical expression has 7 elements: only the first used
3: In base::":"(from, to) :
numerical expression has 5 elements: only the first used
4: In base::":"(from, to) :
numerical expression has 5 elements: only the first used
I would like to disregard them, and that's fine when the data is small and I can check by sight the result. But since my data.table has thousands of rows, I would like to be sure that these warnings are actually harmless.
What do you think? Or, equivalently, is my method for assigning a 'vector' by grouping in a data.table allowed? Are there alternatives?
thank you
Upvotes: 0
Views: 648
Reputation: 38520
You are getting the warning from this portion of the code: 0:(len-1)
. The second argument to :
, len-1
is a vector of length .N
, but :
wants a vector of length 1. You can recreate the warning with (1:2):(2:3)
or with seq_len(2):seq_len(2)
.
The following will calculate what you want in one line without said warning:
dt[, rank := (seq_len(.N) - 1) / (.N - 1), by=b]
dt
a b rank
1: a 1 0.0000000
2: b 1 0.5000000
3: c 1 1.0000000
4: d 2 0.0000000
5: e 2 0.1666667
6: f 2 0.3333333
7: g 2 0.5000000
8: h 2 0.6666667
9: i 2 0.8333333
10: j 2 1.0000000
11: k 3 0.0000000
12: l 3 0.2500000
13: m 3 0.5000000
14: n 3 0.7500000
15: o 3 1.0000000
16: p 4 0.0000000
17: q 4 0.2500000
18: r 4 0.5000000
19: s 4 0.7500000
20: t 4 1.0000000
Upvotes: 2