Reputation: 1255
I'm looking for a way to get character after 2nd place from a string in a dataframe column only if the length of the character is > 2 and place it into another column else null. I have several other columns in the spark dataframe
I have a Spark dataframe that looks like this:
animal
======
mo
cat
mouse
snake
reptiles
I want something like this:
remainder
========
null
t
use
ake
ptiles
I can do it using np.where in pandas dataframe like below
import numpy as np
df['remainder'] = np.where(len(df['animal]) > 2, df['animal].str[2:], 'null)
How do I do the same in pyspark dataframe
Upvotes: 0
Views: 587
Reputation: 5052
You can easily do this with a combination of when-otherwise with substring
s = StringIO("""
animal
mo
cat
mouse
snake
reptiles
""")
df = pd.read_csv(s,delimiter=',')
sparkDF = sql.createDataFrame(df)
sparkDF.show()
+--------+
| animal|
+--------+
| mo|
| cat|
| mouse|
| snake|
|reptiles|
+--------+
sparkDF = sparkDF.withColumn('animal_length',F.length(F.col('animal'))) \
.withColumn('remainder',F.when(F.col('animal_length') > 2
,F.substring(F.col('animal'),2,1000)
).otherwise(None)
) \
.drop('animal_length')
sparkDF.show()
+--------+---------+
| animal|remainder|
+--------+---------+
| mo| null|
| cat| at|
| mouse| ouse|
| snake| nake|
|reptiles| eptiles|
+--------+---------+
Upvotes: 1