Reputation: 15
Table in oracle has 37 columns. names of columns are: year,month,d1,d2....d34. Data in d1..d34 are all integers. There is one more column called maxd which is blank. For each row, I have to find the greatest value out of d1,d2....d34 and put that in maxd column. Oracle code:
UPDATE documents set maxd = GREATEST(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31,d32,d33,d34);
I have created a dataframe containing the exact set of 37 columns in spark1.6. Now, need to write code to calculate the maximum value in d1,d2...d34 for each row and store it in maxd column. Please help.
Upvotes: 1
Views: 451
Reputation: 11274
This uses Spark 1.6
df = sqlContext.createDataFrame([(10, 400, 30)], ['x', 'y', 'z'])
>>> df.show()
+---+---+---+
| x| y| z|
+---+---+---+
| 10|400| 30|
+---+---+---+
>>> df.select(greatest(df.x,df.y,df.z).alias('greatest')).show()
+--------+
|greatest|
+--------+
| 400|
+--------+
Update
I have just learnt that you could pass greatest(*df.columns)
instead of df.x, df.y and so on
. This is very useful when the no. of columns are huge. So the above could be rewritten as
>>> df.select(greatest(*df.columns).alias('greatest')).show()
+--------+
|greatest|
+--------+
| 400|
+--------+
Upvotes: 0