Kumar
Kumar

Reputation: 41

Data arrangement with window Hive or spark scala

I have to organize the data.

I/p:

ID |VALUE
1|a
2|null
3|null
4|b
5|null
6|null
7|c

Need output using hive or data frame.

O/P:

ID|Value
1|a
2|b
3|b
4|b
5|c
6|c
7|c

Upvotes: 0

Views: 63

Answers (2)

notNull
notNull

Reputation: 31490

In Spark use first(expr[, isIgnoreNull=true]) with window orderBy monotonically_increasing_id() functions and rowsBetween as currentRow to unboundedFollowing.

Example:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._

df.show()
//+---+-----+
//| ID|VALUE|
//+---+-----+
//|  1|    a|
//|  2| null|
//|  3| null|
//|  4|    b|
//|  5| null|
//|  6| null|
//|  7|    c|
//+---+-----+

//if ID will be sequentially increasing
val w=Window.orderBy("ID").rowsBetween(0,Window.unboundedFollowing) 

val w=Window.orderBy(monotonically_increasing_id()).rowsBetween(0,Window.unboundedFollowing) 
df.withColumn("VALUE", first("value",true).over(w)).show()

//+---+-----+
//| ID|VALUE|
//+---+-----+
//|  1|    a|
//|  2|    b|
//|  3|    b|
//|  4|    b|
//|  5|    c|
//|  6|    c|
//|  7|    c|
//+---+-----+

Upvotes: 1

leftjoin
leftjoin

Reputation: 38325

Solution for Hive:

with mytable as (
select stack(7,
 1,'a'  ,
 2,null ,
 3,null ,
 4,'b'  ,
 5,null ,
 6,null ,
 7,'c'
) as (id, value)
)

SELECT id, 
       first_value(value,true) over(order by id rows between current row and unbounded following) value
  FROM mytable;

Result:

id  value
1   a
2   b
3   b
4   b
5   c
6   c
7   c

Upvotes: 0

Related Questions