Reputation: 91
I wonder if there is a way to automatise that... I want to make a function in which I will tell, how many columns I want to join. If I have dataFrame with 3 columns and give a parameter "number_of_columns=3", than it will join columns: 0, 1, 2. But if I have dataFrame with 7 columns and give a parameter "number_of_columns=7", than it will join columns: 0, 1, 2, 3, 4, 5, 6. Names of the columns are always the same: From "0" to "number_of_columns-1".
Is there any way to do that? Or I must have another function if I have another number of columns to merge?
def my_function(spark_column, name_of_column):
new_spark_column = spark_column.withColumn(name_of_column, concat_ws("",
col("0").cast("Integer"),
col("1").cast("Integer"),
col("2").cast("Integer"),
col("3").cast("Integer"),
col("4").cast("Integer"),
col("5").cast("Integer"),
col("6").cast("Integer") ))
Upvotes: 2
Views: 69
Reputation: 42392
You can use a list comprehension to do this:
from pyspark.sql.functions import concat_ws, col
def my_function(spark_column, n_cols, name_of_column):
new_spark_column = spark_column.withColumn(
name_of_column,
concat_ws("", *[col(c).cast("Integer") for c in spark_column.columns[:n_cols]])
)
return new_spark_column
Upvotes: 1