Oracle to Spark/Hive: how to convert use of "greatest" function to Spark 1.6 dataframe

Question

Table in oracle has 37 columns. names of columns are: year,month,d1,d2....d34. Data in d1..d34 are all integers. There is one more column called maxd which is blank. For each row, I have to find the greatest value out of d1,d2....d34 and put that in maxd column. Oracle code:

UPDATE documents set maxd = GREATEST(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31,d32,d33,d34);

I have created a dataframe containing the exact set of 37 columns in spark1.6. Now, need to write code to calculate the maximum value in d1,d2...d34 for each row and store it in maxd column. Please help.

Bala · Accepted Answer

This uses Spark 1.6

df = sqlContext.createDataFrame([(10, 400, 30)], ['x', 'y', 'z'])

>>> df.show()
+---+---+---+
|  x|  y|  z|
+---+---+---+
| 10|400| 30|
+---+---+---+

>>> df.select(greatest(df.x,df.y,df.z).alias('greatest')).show()
+--------+
|greatest|
+--------+
|     400|
+--------+

Update

I have just learnt that you could pass greatest(*df.columns) instead of df.x, df.y and so on. This is very useful when the no. of columns are huge. So the above could be rewritten as

>>> df.select(greatest(*df.columns).alias('greatest')).show()
+--------+
|greatest|
+--------+
|     400|
+--------+

Oracle to Spark/Hive: how to convert use of "greatest" function to Spark 1.6 dataframe

Answers (1)

Related Questions

Oracle to Spark/Hive: how to convert use of &quot;greatest&quot; function to Spark 1.6 dataframe

Answers (1)

Related Questions

Oracle to Spark/Hive: how to convert use of "greatest" function to Spark 1.6 dataframe