Reputation: 551
I have a data similar to the below one:
Col1,col2,col3
a,1,#
b,2,$
c,3,%
I need to create a new column with col2 as key and col3 as value, similar to below:
Col1,col2,col3,col4
a,1,#,{1:#}
b,2,$,{2:$}
c,3,%,{4:%}
How can I achieve this using pyspark?
Upvotes: 0
Views: 790
Reputation: 42422
Try format_string
:
import pyspark.sql.functions as F
df2 = df.withColumn('col4', F.format_string('{%d:%s}', 'col2', 'col3'))
df2.show()
+----+----+----+-----+
|Col1|col2|col3| col4|
+----+----+----+-----+
| a| 1| #|{1:#}|
| b| 2| $|{2:$}|
| c| 3| %|{3:%}|
+----+----+----+-----+
If you want a key-value relationship, maps might be more appropriate:
df2 = df.withColumn('col4', F.create_map('col2', 'col3'))
df2.show()
+----+----+----+--------+
|Col1|col2|col3| col4|
+----+----+----+--------+
| a| 1| #|[1 -> #]|
| b| 2| $|[2 -> $]|
| c| 3| %|[3 -> %]|
+----+----+----+--------+
You can also convert the map to a JSON string, similar to your expected output:
df2 = df.withColumn('col4', F.to_json(F.create_map('col2', 'col3')))
df2.show()
+----+----+----+---------+
|Col1|col2|col3| col4|
+----+----+----+---------+
| a| 1| #|{"1":"#"}|
| b| 2| $|{"2":"$"}|
| c| 3| %|{"3":"%"}|
+----+----+----+---------+
Upvotes: 1