isilia
isilia

Reputation: 361

transform function in pyspark

I was reading the official documentation of PySpark API reference for dataframe and below code snippet for transform function over a dataframe have me confused. I can't figure out why * is placed before sorted function in sort_columns_asc function defined below

from pyspark.sql.functions import col
df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
def cast_all_to_int(input_df):
    return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
def sort_columns_asc(input_df):
    return input_df.select(*sorted(input_df.columns))
df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
|    1|  1|
|    2|  2|
+-----+---+

Please help me clarify the confusion.

Upvotes: 0

Views: 1704

Answers (1)

JAdel
JAdel

Reputation: 1616

It's used to unpack arrays/collections from a higher dimension.

# 1D Array
collection1 = [1,2,3,4]
print(*collection1)
1 2 3 4

# 2D Array
collection2 = [[1,2,3,4]]
print(*collection2)
[1, 2, 3, 4]

In your example you are unpacking the names of the column names from

example = ["int", "float"]

to

print(*sorted(example))
float int

Check out this for further information.

Upvotes: 1

Related Questions