Makar Nikitin
Makar Nikitin

Reputation: 329

Pyspark create combinations from list

Say, I have Dataframe:

df = spark.createDataFrame([['some_string', 'A'],['another_string', 'B']],['a','b'])
           a               |     b
---------------------------+------------
 some_string               |     A
 another_string            |     B

And i have list of ints like [1,2,3] What i want - is to add list column to my dataframe.

           a               |     b     |     c      
---------------------------+-----------+------------
 some_string               |     A     |     1      
 some_string               |     A     |     2      
 some_string               |     A     |     3      
 another_string            |     B     |     1      
 another_string            |     B     |     2      
 another_string            |     B     |     3      

Is there any way to do it without udf?

Upvotes: 0

Views: 716

Answers (2)

murtihash
murtihash

Reputation: 8410

You could also just use explode, and avoid unnecessary shuffle caused by joins.

ints=[1,2,3]

from pyspark.sql import functions as F

df.withColumn("c", F.explode(F.array(*[F.lit(x) for x in ints]))).show()

#+--------------+---+---+
#|             a|  b|  c|
#+--------------+---+---+
#|   some_string|  A|  1|
#|   some_string|  A|  2|
#|   some_string|  A|  3|
#|another_string|  B|  1|
#|another_string|  B|  2|
#|another_string|  B|  3|
#+--------------+---+---+

Upvotes: 1

s.polam
s.polam

Reputation: 10382

Use crossJoin. Please check below code.

>>> dfa.show()
+--------------+---+
|             a|  b|
+--------------+---+
|   some_string|  A|
|another_string|  B|
+--------------+---+

>>> dfb.show()
+---+
| id|
+---+
|  1|
|  2|
|  3|
+---+

>>> dfa.crossJoin(dfb).show()
+--------------+---+---+
|             a|  b| id|
+--------------+---+---+
|   some_string|  A|  1|
|   some_string|  A|  2|
|   some_string|  A|  3|
|another_string|  B|  1|
|another_string|  B|  2|
|another_string|  B|  3|
+--------------+---+---+

Upvotes: 2

Related Questions