user3448011
user3448011

Reputation: 1599

how to split one spark dataframe column into two columns by conditional when

I would like to replace a column of pyspark dataframe.

the dataframe:

   price
   90.16|USD  

I need:

  dollar_price currency
  9016          USD

Pyspark code:

  new_col = F.when(F.col("price").isNull() == False, F.substring(F.col('price'), 1, F.instr(F.col('retail_value'), '|')-1)).otherwise(null)


   new_df = df.withColumn('dollar_price', new_col)

   new_col = F.when(F.col("price").isNull() == False, F.substring(F.col('price'), F.instr(F.col('retail_value'), '|')+1, 3)).otherwise(null)


   new_df_1 = new_df.withColumn('currency', new_col)

I got error:

  TypeError: Column is not iterable

Could you please tell me what I missed ?

I have tried Split a dataframe column's list into two dataframe columns

but it does not work.

thanks

Upvotes: 2

Views: 768

Answers (1)

notNull
notNull

Reputation: 31540

Try with expr as you are computing value from instr function.

Example:

df.show()
#+---------+
#|    price|
#+---------+
#|90.16|USD|
#+---------+

from pyspark.sql.functions import *
from pyspark.sql.types import *

df.withColumn("dollar_price",when(col("price").isNull()==False,expr("substring(price,1,instr(price,'|')-1)")).otherwise(None)).\
withColumn("currency",when(col("price").isNull()==False,expr("substring(price,instr(price,'|')+1,3)")).otherwise(None)).\
show()

#+---------+------------+--------+
#|    price|dollar_price|currency|
#+---------+------------+--------+
#|90.16|USD|       90.16|     USD|
#+---------+------------+--------+

Upvotes: 2

Related Questions