Sean Fu
Sean Fu

Reputation: 19

How to combine multi columns into one in pyspark

I have a dataframe with 2 columns (df1). Now I want to merge columns values into one (df2). How?

Upvotes: 1

Views: 2411

Answers (3)

Luiz Viola
Luiz Viola

Reputation: 2436

from pyspark.sql.functions import concat

df1.withColumn("Merge", concat(df1.Column_1, df1.Column_2)).show()

Upvotes: 1

Pav3k
Pav3k

Reputation: 909

Let's say you have DataFrame like this:

d = [
    ("Value 1", 1),
    ("Value 2", 2),
    ("Value 3", 3),
    ("Value 4", 4),
    ("Value 5", 5),
    
]
df = spark.createDataFrame(d,['col1','col2'])
df.show()

# output
+-------+----+
|   col1|col2|
+-------+----+
|Value 1|   1|
|Value 2|   2|
|Value 3|   3|
|Value 4|   4|
|Value 5|   5|
+-------+----+

You can join columns and format them as you want using following syntax:

(
    df.withColumn("newCol", 
                  F.format_string("Col 1: %s Col 2: %s", df.col1, df.col2))
    .show(truncate=False)
)

# output
+-------+----+-----------------------+
|col1   |col2|newCol                 |
+-------+----+-----------------------+
|Value 1|1   |Col 1: Value 1 Col 2: 1|
|Value 2|2   |Col 1: Value 2 Col 2: 2|
|Value 3|3   |Col 1: Value 3 Col 2: 3|
|Value 4|4   |Col 1: Value 4 Col 2: 4|
|Value 5|5   |Col 1: Value 5 Col 2: 5|
+-------+----+-----------------------+

Upvotes: 1

Thijs
Thijs

Reputation: 296

You can use a struct or a map.

struct:

df.withColumn(
    "price_struct",
    F.struct(
        (F.col("total_price")*100).alias("amount"),
        "total_price_currency",
        F.lit("CENTI").alias("unit")
    )
)

results in

+-----------+--------------------+--------------------+
|total_price|total_price_currency|        price_struct|
+-----------+--------------------+--------------------+
|       79.0|                 USD|[7900.0, USD, CENTI]|
+-----------+--------------------+--------------------+

or as a map

df
 .withColumn("price_map",
    F.create_map(
        F.lit("currency"), F.col("total_price_currency"),
        F.lit("amount"), F.col("total_price")*100,
        F.lit("unit"), F.lit("CENTI")
    ).alias("price_struct")
)

results in

+-----------+--------------------+--------------------+
|total_price|total_price_currency|           price_map|
+-----------+--------------------+--------------------+
|       79.0|                 USD|[currency -> USD,...|
+-----------+--------------------+--------------------+

Upvotes: 0

Related Questions