Reputation: 311
I have a spark dataframe and I want to add few columns if doesn't already exists.
df1:
id Name age
1 Abc 20
2 def 30
I want to check if columns are not already exists in df and if doesn't exist add columns: 'gender','city','contact' to df1 and populate null values in them and finally obtain:
df1:
id Name age gender city contact
1 Abc 20
2 def 30
Upvotes: 2
Views: 3969
Reputation: 943
You can do like below,
from pyspark import Row
from pyspark.sql import functions as F
row = Row('id', 'Name', 'age', 'gender')
row_df = spark.createDataFrame(
[row(1, 'Test', '12', 'Male'), row(2, 'Test2', '15', 'Female')])
row_df.show()
if 'gender' not in row_df.columns:
row_df = row_df.withColumn('gender', F.lit(None))
if 'city' not in row_df.columns:
row_df = row_df.withColumn('city', F.lit(None))
if 'contact' not in row_df.columns:
row_df = row_df.withColumn('contact', F.lit(None))
row_df.show()
Output:
+---+-----+---+------+
| id| Name|age|gender|
+---+-----+---+------+
| 1| Test| 12| Male|
| 2|Test2| 15|Female|
+---+-----+---+------+
+---+-----+---+------+----+-------+
| id| Name|age|gender|city|contact|
+---+-----+---+------+----+-------+
| 1| Test| 12| Male|null| null|
| 2|Test2| 15|Female|null| null|
+---+-----+---+------+----+-------+
Upvotes: 3