Reputation: 176
I'm trying to operate on a data.table column using a different data.table, and assign the result to a new column in the first data.table. But I keep having this issue:
Warning messages:
1: In from:(from + len) :
numerical expression has 10 elements: only the first used
Here is the data:
tstamps = c(1504306173, NA, NA, NA, NA, 1504393006, NA, NA, 1504459211, NA)
set.seed(0.1)
dt1 = data.table(utc_tstamp = sample(rep(tstamps, 100), 100))
dt2 = data.table(from = sample((1:90), 10), len = sample(1:10, 10))
> dt2
from len
1: 55 6
2: 59 9
3: 32 10
4: 24 3
5: 86 7
6: 54 1
7: 18 5
8: 11 8
9: 40 4
10: 75 2
I'm trying to count the number of NA
's in dt1[from:(from+len), ]
and assign the result to a new column, count, in dt2
.
What I currently have for that is this
dt2[, count := dt1[from:(from+len), ][is.na(utc_tstamp), .N]]
but this is only using dt2[1,]$from
and dt2[1,]$len
, all the counts are just the number of NA
's in dt1[dt2[1,]$from:(dt2[1,]$from + dt2[1,]$len), ]
, and I receive the following warning
Warning messages:
1: In from:(from + len) :
numerical expression has 10 elements: only the first used
2: In from:(from + len) :
numerical expression has 10 elements: only the first used
and the result is this:
> dt2
from len count
1: 55 6 5
2: 59 9 5
3: 32 10 5
4: 24 3 5
5: 86 7 5
6: 54 1 5
7: 18 5 5
8: 11 8 5
9: 40 4 5
10: 75 2 5
while it should be this:
> dt2
from len count
1: 55 6 5
2: 59 9 5
3: 32 10 8
4: 24 3 3
5: 86 7 5
6: 54 1 2
7: 18 5 4
8: 11 8 5
9: 40 4 4
10: 75 2 2
I'd appreciate it if someone explains why this is happening and how can I get what I want.
Upvotes: 4
Views: 124
Reputation: 66819
Or define the joining variables explicitly and use a non-equi join:
dt2[, to := from+len]
dt1[, r := .I]
dt2[, ct := dt1[is.na(utc_tstamp)][dt2, on=.(r >= from, r <= to), .N, by=.EACHI]$N]
Upvotes: 2
Reputation: 887138
Based on the description, we get the sequence between the 'from' and 'from' added with 'len', based on this position index get the corresponding elements of 'utc_stamp' column from 'dt1', convert it to logical (is.na(
), and get the sum
i.e. sum of TRUE elements or the number of NA
elements. Assign (:=
) it to create a new column 'count' in 'df2'
dt2[, count := unlist(Map(function(x, y)
sum(is.na(dt1$utc_tstamp[x:y])), from , from + len))]
dt2
# from len count
# 1: 55 6 5
# 2: 59 9 5
# 3: 32 10 8
# 4: 24 3 3
# 5: 86 7 5
# 6: 54 1 2
# 7: 18 5 4
# 8: 11 8 5
# 9: 40 4 4
#10: 75 2 2
Or another option is to group by sequence of rows and then do the the sequence (:
) based on 'from', 'len' columns to subset the column values from 'dt1' and get the sum
of logical vector
dt2[, count := sum(is.na(dt1$utc_tstamp[from:(from + len)])), by = 1:nrow(dt2)]
Upvotes: 2