Reputation: 111
I have 10,000 rows of data in Excel and the first column is the time of a data recording (hh:mm:ss). I want to filter/extract (in the front end) only the rows where the readings are at least six consecutive seconds - these are signal strength recordings and I only consider readings of at least six consecutive seconds to be valid for my purposes. I would only want to extract the rows in green in the attached image. Separate ask - what if I wanted segments of only six to eight seconds? Thanks!
Upvotes: 1
Views: 160
Reputation: 6769
With O365 (assuming no excel version constraints per tags used in the question) you can try the following in cell D1
:
=LET(rng, A1:B32, ref, SCAN("", INDEX(rng,,1), LAMBDA(ac,a, IF(ac="", 1,
IF((a - TIME(0,0,1)) = OFFSET(a,-1,0), 0, 1)))), size, ROWS(ref),
seq, SEQUENCE(size), start, FILTER(seq, ref=1),
end, VSTACK(DROP(start-1, 1), size), DROP(REDUCE("", start, LAMBDA(ac,s,
LET(e, XLOOKUP(s, start, end), f, FILTER(rng, (seq >= s) * (seq <= e)),
IF(ROWS(f) > 5, VSTACK(ac, f), ac)))),1))
Note: The same result can be achieved to avoid using the volatile function OFFSET
, not using SCAN
, and instead calculating the ref
value using comparison as follow, where A
represents the first column of rng
(we keep the same logic as previous formula, but it can be simplified changing the logic to check 0
instead 1
to remove 1-N()
):
=LET(rng,A1:B32,A,INDEX(rng,,1), ref,VSTACK(1,1-N(DROP(A,1)
=(DROP(A,-1)+TIME(0,0,1)))), size, ROWS(ref), seq, SEQUENCE(size),
start, FILTER(seq, ref=1), end, VSTACK(DROP(start-1, 1), size),
DROP(REDUCE("", start, LAMBDA(ac,s, LET(e, XLOOKUP(s, start, end),
f, FILTER(rng, (seq >= s) * (seq <= e)),
IF(ROWS(f) > 5, VSTACK(ac, f), ac)))),1))
Then adjust the input range (rng
) to your real case.
Using LET
for easy reading and composition. The name ref
identifies with 1
, the start of each group of values with consecutive seconds of the first column from rng
. It has the same number of rows as the input data.
We use DROP/REDUCE/VSTACK
pattern to generate iteratively the output with the data that satisfies the conditions the group has more than 5 consecutive elements. Check the following answer to this question: how to transform a table in Excel from vertical to horizontal but with different length
Via REDUCE
we iterate over all start group positions (start
). For each start group position (s
), finds the corresponding end group position (e
) via XLOOKUP
. Filter the range (rng
) for rows (seq
) between start (s
) and end (e
) rows via FILTER
. Append the filter result (f
) only if the number of rows is bigger than 5
via the VSTACK
function.
The formula provided is too large to be used for a conditional format formula (255
maximum number of characters). A possible workaround could be to use a helper column (you can hide it). It returns TRUE
if the row belongs to a valid group, otherwise returns FALSE
. If no valid group were found it returns FALSE
too (we use this trick: NOT(SEQUENCE(size,,1,0)=1)
to generate a constant column with FALSE
values). Then you can highlight the column A
values that correspond with TRUE
values in the helper column.
=LET(A, TOCOL(A:A,1), ref, VSTACK(1,N(DROP(A,1) = (DROP(A,-1)+TIME(0,0,1)))),
size, ROWS(ref), seq, SEQUENCE(size), start, FILTER(seq, ref=0),
end, VSTACK(DROP(start-1, 1), size), gr, FILTER(HSTACK(start, end),
(end-start +1) > 5, -1), sgr, INDEX(gr,,1), egr, INDEX(gr,,2),
IF(@gr=-1, NOT(SEQUENCE(size,,1,0)=1), MAP(seq, LAMBDA(x,
LET(overlaps, SUM((sgr <= x) * (egr >= x)), IF(overlaps = 1, TRUE, FALSE))))))
as input, we use the entire column and filter by nonblank values via the TOCOL
function (named A
) using the second input argument of this function. In case of more than one overlapping interval, it returns FALSE
too, just for testing purposes, it indicates some error calculating start
and end
names because per design it should never happen.
Tip: The previous formula can be used for the original purpose, using its output as a condition for a FILTER
function to select only the input range where the value is TRUE
. It is a matter of taste which route you want to go. For example, as follow:
=LET(rng,A1:B32,A,INDEX(rng,,1), ref, VSTACK(1,N(DROP(A,1)
= (DROP(A,-1)+TIME(0,0,1)))), size, ROWS(ref), seq, SEQUENCE(size),
start, FILTER(seq, ref=0), end, VSTACK(DROP(start-1, 1), size),
gr, FILTER(HSTACK(start, end), (end-start +1) > 5, -1),
sgr, INDEX(gr,,1), egr, INDEX(gr,,2),
incl, IF(@gr=-1, NOT(SEQUENCE(size,,1,0)=1), MAP(seq, LAMBDA(x,
LET(overlaps, SUM((sgr <= x) * (egr >= x)), IF(overlaps = 1, TRUE, FALSE))))),
FILTER(rng, incl=TRUE, "No Group found"))
Upvotes: 1