Reputation: 1073
I am beginner in pythons and spark.
My Product_Version column looks like: 87.12.0.12
. sometimes it has only 3 parts: 89.0.0
I want to split them using separator in different columns and so I did:
cols = split(dfroot_temp["Product_Version"], "\.")
dfroot_staged_temp = dfroot_temp.withColumn("Ver_Major", cols.getItem(0)) \
.withColumn("Ver_Minor", cols.getItem(1)) \
.withColumn("Ver_Build", cols.getItem(3))
Now, sometimes when the last part in not present in dataframe script throws error at line: .withColumn("Ver_Build", cols.getItem(3))
How can I know in advance weather it has 3rd part is present the column so I will do lit(0) in-place? Is there any function to get the count of these column items?
Upvotes: 0
Views: 551
Reputation: 3006
You can use pyspark.sql.functions.size method.
cols.size
would return 3 for 89.0.0 and 4 for 89.0.0.1
cols = split(dfroot_temp["Product_Version"], "\.")
dfroot_staged_temp = dfroot_temp.withColumn("Ver_Major", cols.getItem(0)) \
.withColumn("Ver_Minor", cols.getItem(1)) \
.withColumn("Ver_Build", when(size(cols) == 4, cols.getItem(3)).otherwise(lit(0)))
Upvotes: 1