Reputation: 1161
I have the following values;
- - - - - -
A| B | C|
- - - - - -
1| 2 | 3|
2| 3 | 6|
3| 5 | 4|
i want to take the minimum across the rows of column B and C.
so that
- - - - - -
A| min(B,C)
- - - - - -
1| 2
2| 3
3| 4
How do I do this in pyspark dataframe?
Upvotes: 0
Views: 2346
Reputation: 4420
Whatever you want to check and study refer to pyspark API docs. It will have all possible functions and related docs. In below example, I used least
for min
and greatest
for max
.
from pyspark.sql import functions as F
df = sqlContext.createDataFrame([
[1,3,2],
[2,3,6],
[3,5,4]
], ['A','B', 'C'])
df.withColumn(
"max",
F.greatest(*[F.col(cl) for cl in df.columns[1:]])
).withColumn(
"min",
F.least(*[F.col(cl) for cl in df.columns[1:]])
).show()
Pyspark API Link: - https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.DataFrame
Upvotes: 1