Yas
Yas

Reputation: 25

Find the position of a value in an array and use the position to get a value from array in other column

I have a table, where I would like to loop through the array in column "Type". If it contains 'cover', take the value from the "ID" array of the same position. And if there are 2 'cover' values, check the "style" column - if it is 'multi', ignore and take other value position.

ID Type style
[222, 345, 678] ['leg','cover','cover'] ['modern','multi','traditional']
[989, 787, 125] ['cover','hanger','comp'] ['modern','modern','modern']

The desired output:

ID Type style output
[222, 345, 678] ['leg','cover','cover'] ['modern','multi','traditional'] 678
[989, 787, 125] ['cover','hanger','comp'] ['modern','modern','modern'] 989

I am using this code to get the position but I am stuck with the rest.

df.select(df.Type, array_position(df.Type, "cover").alias('a_pos')).show()

Upvotes: 2

Views: 146

Answers (1)

ZygD
ZygD

Reputation: 24366

You could create a zipped array and filter out elements which you don't need.

Input:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [([222, 345, 678], ['leg','cover','cover'], ['modern','multi','traditional']),
     ([989, 787, 125], ['cover','hanger','comp'], ['modern','modern','modern']),
     ([123, 234, 345], ['cover','hanger','comp'], ['multi','modern','modern']),
     ([456, 567, 678], ['couch','hanger','comp'], ['modern','modern','modern'])],
    ['ID', 'Type', 'style'])

Script:

filtered = F.filter(
    F.arrays_zip('ID', 'Type', 'style'),
    lambda x: (x.Type == 'cover') & (x.style != 'multi')
)
df = df.withColumn('output', filtered[0].ID)

df.show(truncate=0)
# +---------------+---------------------+----------------------------+------+
# |ID             |Type                 |style                       |output|
# +---------------+---------------------+----------------------------+------+
# |[222, 345, 678]|[leg, cover, cover]  |[modern, multi, traditional]|678   |
# |[989, 787, 125]|[cover, hanger, comp]|[modern, modern, modern]    |989   |
# |[123, 234, 345]|[cover, hanger, comp]|[multi, modern, modern]     |null  |
# |[456, 567, 678]|[couch, hanger, comp]|[modern, modern, modern]    |null  |
# +---------------+---------------------+----------------------------+------+

Upvotes: 1

Related Questions