Reputation: 91
I wanted to compare the current row value to all previous value either in Tableau or SQL. Below is an example of what I wanted.
So like for colA Row2, I would like to compare it to Row1 if it is greater than. For colA Row 2, I would like to compare it with Row2 and Row3. This will go on and on until it reaches the end of the partition. Then the output is just counting how many scenarios were TRUE (as shown in the right side of the table).
Does anyone know any syntax on how to do this in Tableau or Hive SQL? I already explored the LAG() function in HIVE sql as well as WINDOW_SUM() function in Tableau.
I can opt to create a UDF in Hive but I am not that familiar with the concept nor with the syntaxes.
Thanks for the help.
Upvotes: 2
Views: 1540
Reputation: 11919
When the order of data values affects a calculation, you have two choices as a Tableau client.
Both approaches are useful. Both have quirks and complexities to get around. All things being equal, table calcs are more flexible so I'd try those first. Realize that in addition to the formula, a table calc is also specified by information about how to traverse the query result set - known as partitioning and addressing. See the online help for table calcs.
However, since table calcs are implemented client side, then they aren't the best choice when you have extremely large data sets that would otherwise not need to be fetched to the client. They are a good choice when the information driving your calculations has already been fetched client-side, say aggregate query results.
Upvotes: 0
Reputation: 13407
Try this one:
where gs_test
is your input table with cola
column
select
cola,
row_num,
sum(if(cola>next_el_, 1, 0)) as countOfTrueBefore
from
(
select
cola,
row_num,
collect_set(cola) over (order by row_num asc rows between 1 following and unbounded following) as next_el
from
(
select
cola,
sum(1) over (rows between unbounded preceding and current row) as row_num
from
gs_test
) main_inner
) main
lateral view outer explode(next_el) ne as next_el_ group by cola, row_num;
Result:
cola - original column
row_num - inverse row number (the first row is the last)
countOfTrueBefore - as per your logic
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 6.91 sec HDFS Read: 7690 HDFS Write: 278 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 7.2 sec HDFS Read: 8314 HDFS Write: 285 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 5.34 sec HDFS Read: 5586 HDFS Write: 63 SUCCESS
Total MapReduce CPU Time Spent: 19 seconds 450 msec
OK
0.01 5 0
0.33 1 1
0.47 6 0
0.48 2 2
0.75 4 2
0.85 7 0
0.88 3 4
Upvotes: 1
Reputation: 1271141
In order for your question to make sense, you need a column that specifies the ordering. I will assume the first columns is an id
column with this infromation.
This is challenging in SQL in general. One method that should work in Hive is a cross join and aggregation.
select t.id, t.colA,
sum(case when t2.id < t.id and t2.colA < t.colA then 1 else 0 end)
from t cross join
t t2
group by t.id, t.colA;
Note: this will not have good performance on even a medium sized table.
Upvotes: 0