user19929902
user19929902

Reputation:

how to iterate over each row in pyspark dataframe

I have a dataframe like:

name       address     result
rishi     los angeles   true
tushar    california    false
keerthi   texas         false

I want to iterate through each row of the dataframe and check if result value is "true" or "false"

if true i want to copy the address to another address new column and if false i want to make address new column as "Null"

how to achieve this using pyspark?

result should be

name       address     result  address_new
rishi     los angeles   true   los angeles

tushar    california    false   null
keerthi   texas         false   null

Upvotes: 1

Views: 278

Answers (2)

Joakim Torsvik
Joakim Torsvik

Reputation: 85

This line should work to filter the data:

new_df = df[df['result'] == True]

In terms of the new address, a list comprehension could be used.

df['address_new'] = [df.loc[i]['address'] if df.loc[i] == True else None for i in range(df.shape[0])]

Upvotes: 1

FFGH
FFGH

Reputation: 121

You can use when() for this purpose, I'd suggest reading up on the basics of PySpark. However in your case it would be something like:

df = df.withColumn("address_new", when(df.result == true, df.address)
.otherwise(null))

Upvotes: 0

Related Questions