deathrace
deathrace

Reputation: 1073

How do I split dataframe in pyspark.sql.column.Column and get count of items

I am beginner in pythons and spark.

My Product_Version column looks like: 87.12.0.12. sometimes it has only 3 parts: 89.0.0

I want to split them using separator in different columns and so I did:

cols = split(dfroot_temp["Product_Version"], "\.")
dfroot_staged_temp = dfroot_temp.withColumn("Ver_Major", cols.getItem(0)) \
    .withColumn("Ver_Minor", cols.getItem(1)) \
    .withColumn("Ver_Build", cols.getItem(3))

Now, sometimes when the last part in not present in dataframe script throws error at line: .withColumn("Ver_Build", cols.getItem(3))

How can I know in advance weather it has 3rd part is present the column so I will do lit(0) in-place? Is there any function to get the count of these column items?

Upvotes: 0

Views: 551

Answers (1)

Drashti Dobariya
Drashti Dobariya

Reputation: 3006

You can use pyspark.sql.functions.size method.

cols.size would return 3 for 89.0.0 and 4 for 89.0.0.1

cols = split(dfroot_temp["Product_Version"], "\.")
dfroot_staged_temp = dfroot_temp.withColumn("Ver_Major", cols.getItem(0)) \
    .withColumn("Ver_Minor", cols.getItem(1)) \
    .withColumn("Ver_Build", when(size(cols) == 4, cols.getItem(3)).otherwise(lit(0)))

Upvotes: 1

Related Questions