Reputation: 361
I was reading the official documentation of PySpark API reference for dataframe and below code snippet for transform function over a dataframe have me confused. I can't figure out why * is placed before sorted function in sort_columns_asc function defined below
from pyspark.sql.functions import col
df = spark.createDataFrame([(1, 1.0), (2, 2.0)], ["int", "float"])
def cast_all_to_int(input_df):
return input_df.select([col(col_name).cast("int") for col_name in input_df.columns])
def sort_columns_asc(input_df):
return input_df.select(*sorted(input_df.columns))
df.transform(cast_all_to_int).transform(sort_columns_asc).show()
+-----+---+
|float|int|
+-----+---+
| 1| 1|
| 2| 2|
+-----+---+
Please help me clarify the confusion.
Upvotes: 0
Views: 1704
Reputation: 1616
It's used to unpack arrays/collections from a higher dimension.
# 1D Array
collection1 = [1,2,3,4]
print(*collection1)
1 2 3 4
# 2D Array
collection2 = [[1,2,3,4]]
print(*collection2)
[1, 2, 3, 4]
In your example you are unpacking the names of the column names from
example = ["int", "float"]
to
print(*sorted(example))
float int
Check out this for further information.
Upvotes: 1