xxxerneaxx
xxxerneaxx

Reputation: 49

Get first element in array Pyspark

I want to add new 2 columns value services arr first and second value but I'm getting the error:

Field name should be String Literal, but it's 0;

production_target_datasource_df.withColumn("newcol",production_target_datasource_df["Services"].getItem(0))
    +------------------+--------------------+
    |         cid      |            Services|
    +------------------+--------------------+
    |845124826013182686|     [112931, serv1]|
    |845124826013182686|     [146936, serv1]|
    |845124826013182686|      [32718, serv2]|
    |845124826013182686|      [28839, serv2]|
    |845124826013182686|       [8710, serv2]|
    |845124826013182686|    [2093140, serv3]|

Upvotes: 4

Views: 11725

Answers (2)

MichaelU
MichaelU

Reputation: 83

As the error is saying, you need to pass a string not a 0. Then, you wonder : what string should I pass ?

If you follow @pault advice, and printSchema, you will actually know what are the corresponding keys to your values in the list.

Here is the documentation of getItem, helping you figure this out. enter image description here

Another way to know what to pass, is to simply pass any string, you could type:

production_target_datasource_df.withColumn("newcol",production_target_datasource_df["Services"].getItem('0'))

and the logs will tell you what keys were expected.

Hope this helps ;)

Upvotes: 1

Cena
Cena

Reputation: 3419

You don't have to use .getItem(0)

production_target_datasource_df["Services"][0] would be enough.

# Constructing your table:
from pyspark.sql import Row

df = sc.parallelize([Row(cid=1,Services=["2", "serv1"]),
Row(cid=1, Services=["3", "serv1"]),
Row(cid=1, Services=["4", "serv2"])]).toDF()
df.show()
+---+----------+
|cid|  Services|
+---+----------+
|  1|[2, serv1]|
|  1|[3, serv1]|
|  1|[4, serv2]|
+---+----------+

# Adding the two columns:
new_df = df.withColumn("first_element", df.Services[0])
new_df = new_df.withColumn("second_element", df.Services[1])
new_df.show()

+---+----------+-------------+--------------+
|cid|  Services|first_element|second_element|
+---+----------+-------------+--------------+
|  1|[2, serv1]|            2|         serv1|
|  1|[3, serv1]|            3|         serv1|
|  1|[4, serv2]|            4|         serv2|
+---+----------+-------------+--------------+

Upvotes: 4

Related Questions