Reputation:
I have a dataframe like:
name address result
rishi los angeles true
tushar california false
keerthi texas false
I want to iterate through each row of the dataframe and check if result value is "true" or "false"
if true i want to copy the address to another address new column and if false i want to make address new column as "Null"
how to achieve this using pyspark?
result should be
name address result address_new
rishi los angeles true los angeles
tushar california false null
keerthi texas false null
Upvotes: 1
Views: 278
Reputation: 85
This line should work to filter the data:
new_df = df[df['result'] == True]
In terms of the new address, a list comprehension could be used.
df['address_new'] = [df.loc[i]['address'] if df.loc[i] == True else None for i in range(df.shape[0])]
Upvotes: 1
Reputation: 121
You can use when()
for this purpose, I'd suggest reading up on the basics of PySpark. However in your case it would be something like:
df = df.withColumn("address_new", when(df.result == true, df.address)
.otherwise(null))
Upvotes: 0