Reputation: 15
I have the following dataframe, that contains a column of arrays (col1
). I need to get the index of the element that contains a certain substring ("58=").
+-----------------------------------------------------------+-----+
| col1 |a_pos|
+-----------------------------------------------------------+-----+
|[8=FIX.4.4, 55=ITUBD264, 58=AID[43e39b2e-c6e2-4947] | 0|
+-----------------------------------------------------------+-----+
I've tried to use array_position(col1, "58=")
, but it seems it only works with the exact match and not substrings.
In Python i'm doing exactly this, but in pandas, by using the following code:
df['idx'] = [max(range(len(l)), key=lambda x: '58=' in l[x]) for l in df['col1']]
Upvotes: 0
Views: 673
Reputation: 26676
Check existence of 58
using the rlike
function in a higher order function. Determine position using array_position
. Code below
df = df.withColumn('index',expr("array_position(transform(col1, x-> rlike(x,58)),true)")).show(truncate=False)
+---------------------------------------------------+-----+-----+
|col1 |a_pos|index|
+---------------------------------------------------+-----+-----+
|[8=FIX.4.4, 55=ITUBD264, 58=AID[43e39b2e-c6e2-4947]|0 |3 |
+---------------------------------------------------+-----+-----+
Upvotes: 1