user2335580
user2335580

Reputation: 408

Obtain last element of list in data frame column

My spark dataframe consists of 3 columns, each of which are lists. The length of list for each row may vary. For eg my data frame would look like

Input

I would like to be able to obtain the last element of these lists. Expected output Expected Output

There was a post to obtain the first element of the list using df = df.withColumn("First_item_Col1",df['Col1'][0])

But when I use -1 to obtain the last item in the above line, it is giving me null values

Upvotes: 1

Views: 808

Answers (2)

dsk
dsk

Reputation: 2003

You can use Spark Higher Order function - element_at in order to get the last element from a list column as below

Create the dataframe

df = spark.createDataFrame([(1,['x','x']),(2,['y']),(3,['x','y','z']),(4,['x','y','y','z'])],[ "col1","col2"])
df.show(truncate=False)
+----+------------+
|col1|col2        |
+----+------------+
|1   |[x, x]      |
|2   |[y]         |
|3   |[x, y, z]   |
|4   |[x, y, y, z]|
+----+------------+

Solution

df = df.withColumn("list_col", F.element_at(F.col('col2'), -1).alias('1st_from_end'))
df.show(truncate=False)
+----+------------+--------+
|col1|col2        |list_col|
+----+------------+--------+
|1   |[x, x]      |x       |
|2   |[y]         |y       |
|3   |[x, y, z]   |z       |
|4   |[x, y, y, z]|z       |
+----+------------+--------+

Upvotes: 0

IoaTzimas
IoaTzimas

Reputation: 10624

You can apply a lambda function to Col1 and Col2 and get last items, like below:

df['Last_Col1']=df['Col1'].apply(lambda x: x[-1])
df['Last_Col2']=df['Col2'].apply(lambda x: x[-1])

Output:

>>> print(df)

           Col1          Col2 Last_Col1 Last_Col2
0        [X, X]        [A, B]         X         B
1           [Y]           [B]         Y         B
2     [X, Y, Z]        [A, C]         Z         C
3  [X, Y, Y, Z]  [A, B, B, C]         Z         C

Upvotes: 2

Related Questions