Ronnie
Ronnie

Reputation: 551

Pyspark convert a Column containing strings into list of strings and save it into the same column

I have a Dataframe with two columns. example :

Col1 | Col2 
 001 | This is the first string
 002 | This is the second string.

I want to do an operation which converts the Dataframe column Col2 into thee following format -

Col1 | Col2
001  | ["This", "is", "the", "first", "string" ] 
002  | ["This", "is", "the", "second", "string" ]

Is there a built in functions that can help me achieve this?

Upvotes: 2

Views: 2062

Answers (1)

Kafels
Kafels

Reputation: 4069

Just run split function

import pyspark.sql.functions as f


df = df.withColumn('Col2', f.split('Col2', ' '))

Upvotes: 1

Related Questions