Pyspark create combinations from list

Question

Say, I have Dataframe:

df = spark.createDataFrame([['some_string', 'A'],['another_string', 'B']],['a','b'])
           a               |     b
---------------------------+------------
 some_string               |     A
 another_string            |     B

And i have list of ints like [1,2,3] What i want - is to add list column to my dataframe.

           a               |     b     |     c      
---------------------------+-----------+------------
 some_string               |     A     |     1      
 some_string               |     A     |     2      
 some_string               |     A     |     3      
 another_string            |     B     |     1      
 another_string            |     B     |     2      
 another_string            |     B     |     3

Is there any way to do it without udf?

murtihash · Accepted Answer

You could also just use explode, and avoid unnecessary shuffle caused by joins.

ints=[1,2,3]

from pyspark.sql import functions as F

df.withColumn("c", F.explode(F.array(*[F.lit(x) for x in ints]))).show()

#+--------------+---+---+
#|             a|  b|  c|
#+--------------+---+---+
#|   some_string|  A|  1|
#|   some_string|  A|  2|
#|   some_string|  A|  3|
#|another_string|  B|  1|
#|another_string|  B|  2|
#|another_string|  B|  3|
#+--------------+---+---+

Pyspark create combinations from list

Answers (2)

Related Questions