wrek
wrek

Reputation: 1161

pypsark: take the min or max values across row?

I have the following values;

- - - - - -
A|  B |  C|
- - - - - - 
1|  2 |  3|
2|  3 |  6|
3|  5 |  4|

i want to take the minimum across the rows of column B and C.

so that

- - - - - -
A|  min(B,C)
- - - - - - 
1|  2
2|  3
3|  4

How do I do this in pyspark dataframe?

Upvotes: 0

Views: 2346

Answers (1)

Rakesh Kumar
Rakesh Kumar

Reputation: 4420

Whatever you want to check and study refer to pyspark API docs. It will have all possible functions and related docs. In below example, I used least for min and greatest for max.

from pyspark.sql import functions as F 
df = sqlContext.createDataFrame([
    [1,3,2],
    [2,3,6],
    [3,5,4]
], ['A','B', 'C'])
df.withColumn(
    "max",
    F.greatest(*[F.col(cl) for cl in df.columns[1:]])
).withColumn(
    "min",
    F.least(*[F.col(cl) for cl in df.columns[1:]])
).show()

Pyspark API Link: - https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.DataFrame

Upvotes: 1

Related Questions