Rv R
Rv R

Reputation: 311

Add columns to pyspark dataframe if not exists

I have a spark dataframe and I want to add few columns if doesn't already exists.

df1:
id Name age
1  Abc  20
2  def  30

I want to check if columns are not already exists in df and if doesn't exist add columns: 'gender','city','contact' to df1 and populate null values in them and finally obtain:

df1:
id Name age gender city contact
1  Abc  20
2  def  30

Upvotes: 2

Views: 3969

Answers (1)

Saurabh
Saurabh

Reputation: 943

You can do like below,

from pyspark import Row
from pyspark.sql import functions as F
row = Row('id', 'Name', 'age', 'gender')
row_df = spark.createDataFrame(
    [row(1, 'Test', '12', 'Male'), row(2, 'Test2', '15', 'Female')])
row_df.show()

if 'gender' not in row_df.columns:
    row_df = row_df.withColumn('gender', F.lit(None))
if 'city' not in row_df.columns:
    row_df = row_df.withColumn('city', F.lit(None))
if 'contact' not in row_df.columns:
    row_df = row_df.withColumn('contact', F.lit(None))

row_df.show()

Output:

+---+-----+---+------+
| id| Name|age|gender|
+---+-----+---+------+
|  1| Test| 12|  Male|
|  2|Test2| 15|Female|
+---+-----+---+------+

+---+-----+---+------+----+-------+
| id| Name|age|gender|city|contact|
+---+-----+---+------+----+-------+
|  1| Test| 12|  Male|null|   null|
|  2|Test2| 15|Female|null|   null|
+---+-----+---+------+----+-------+

Upvotes: 3

Related Questions