Pyspark Data Frame: Access to a Column (TypeError: Column is not iterable)

Question

I am struggling with a PySpark code, in particular, I'd like to call a function on an object col which is not iterable.

from pyspark.sql.functions import col, lower, regexp_replace, split
from googletrans import Translator

def clean_text(c):
  c = lower(c)
  c = regexp_replace(c, r"^rt ", "")
  c = regexp_replace(c, r"(https?\://)\S+", "")
  c = regexp_replace(c, "[^a-zA-Z0-9\s]", "") #removePunctuation 
  c = regexp_replace(c, r"
", " ")
  c = regexp_replace(c, r"   ", " ")
  c = regexp_replace(c, r"  ", " ")  
#   c = translator.translate(c, dest='en', src='auto')
  return c

clean_text_df = uncleanedText.select(clean_text(col("unCleanedCol")).alias("sentence"))
clean_text_df.printSchema()
clean_text_df.show(10)

As soon as I run the code within c = translator.translate(c, dest='en', src='auto') the error shown from Spark is TypeError: Column is not iterable.

What I would like to do is a translation word by word:

From:

+--------------------+
|            sentence|
+--------------------+
|ciao team there a...|
|dear itteam i urg...|
|buongiorno segnal...|
|hi team regarding...|
|hello please add ...|
|ciao vorrei effet...|
|buongiorno ho vis...|
+--------------------+

To:

+--------------------+
|            sentence|
+--------------------+
|hello team there ...|
|dear itteam i urg...|
|goodmorning segna...|
|hi team regarding...|
|hello please add ...|
|hello would effet...|
|goodmorning I see...|
+--------------------+

The schema of the DataFrame is:

root
 |-- sentence: string (nullable = true)

Could anyone please help me?

Thank you very much

Pyspark Data Frame: Access to a Column (TypeError: Column is not iterable)

Answers (1)

Related Questions