Reputation: 41
I have to organize the data.
I/p:
ID |VALUE
1|a
2|null
3|null
4|b
5|null
6|null
7|c
Need output using hive or data frame.
O/P:
ID|Value
1|a
2|b
3|b
4|b
5|c
6|c
7|c
Upvotes: 0
Views: 63
Reputation: 31490
In Spark use first(expr[, isIgnoreNull=true])
with window orderBy monotonically_increasing_id()
functions and rowsBetween as currentRow
to unboundedFollowing
.
Example:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
df.show()
//+---+-----+
//| ID|VALUE|
//+---+-----+
//| 1| a|
//| 2| null|
//| 3| null|
//| 4| b|
//| 5| null|
//| 6| null|
//| 7| c|
//+---+-----+
//if ID will be sequentially increasing
val w=Window.orderBy("ID").rowsBetween(0,Window.unboundedFollowing)
val w=Window.orderBy(monotonically_increasing_id()).rowsBetween(0,Window.unboundedFollowing)
df.withColumn("VALUE", first("value",true).over(w)).show()
//+---+-----+
//| ID|VALUE|
//+---+-----+
//| 1| a|
//| 2| b|
//| 3| b|
//| 4| b|
//| 5| c|
//| 6| c|
//| 7| c|
//+---+-----+
Upvotes: 1
Reputation: 38325
Solution for Hive:
with mytable as (
select stack(7,
1,'a' ,
2,null ,
3,null ,
4,'b' ,
5,null ,
6,null ,
7,'c'
) as (id, value)
)
SELECT id,
first_value(value,true) over(order by id rows between current row and unbounded following) value
FROM mytable;
Result:
id value
1 a
2 b
3 b
4 b
5 c
6 c
7 c
Upvotes: 0