Reputation: 4348
I'm using spark and mainly interacting with it via sql (I'm a novice to both, so please bear with me). I have a data set that looks like this in a table called taxonomies
:
level_4 level_5 level_6 level_7 system_id node_id
American League West null null 4 633
American League East null null 4 634
National League West Dodgers bellinger 3 635
National League Central Cardinals null null null
American League Central null null null null
I'm going to get the system_id columns by left joining another table; what I want to do here is get a new node_id for the null values by adding one to the max of the node_id column.
I adapted this query from an answer to a previous question:
WITH cte AS (
SELECT max(node_id) AS maxid FROM taxonomies
)
SELECT cte.maxid + row_number() OVER (ORDER BY node_id) AS node_id, system_id, `level_4`, `level_5`, `level_6`, `level_7`
FROM taxonomies
LEFT JOIN cte WHERE taxonomies.node_id IS null
UNION
SELECT * FROM taxonomies WHERE node_id IS NOT null
What I don't understand is that this appropriately makes new node_id for the null values, but for everything else it swaps system_id and node_id. I've tried switching node_id and system_id after AS, but all this does is change the column names with the same results. What I want is this:
level_4 level_5 level_6 level_7 system_id node_id
American League West null null 4 633
American League East null null 4 634
National League West Dodgers bellinger 3 635
National League Central Cardinals null null 636
American League Central null null null 637
Any help is greatly appreciated!
Upvotes: 0
Views: 95
Reputation: 275
A solution to your problem can be this:
SELECT
system_id,
(min(node_id) over ())-1+row_number() OVER (ORDER BY -node_id desc) node_id,
`level_4`, `level_5`, `level_6`, `level_7`
FROM
taxonomies
If you want to do this using join, this is the way:
SELECT
system_id,
t2.min_node-1+row_number() OVER (ORDER BY -tn.node_id desc) node_id,
`level_4`, `level_5`, `level_6`, `level_7`
FROM
taxonomies tn
cross join
(SELECT min(node_id) as min_node FROM taxonomies) as t2
If you want to do this using sub-query, this is the way:
SELECT
system_id,
(select min(node_id) from taxonomies)-1+row_number() OVER (ORDER BY -node_id desc) node_id,
`level_4`, `level_5`, `level_6`, `level_7`
FROM
taxonomies
Upvotes: 1