CODEWITHSUNDEEP

pythonapache-sparkpyspark

Reputation: 1

Word count: 'Column' object is not callable

from pyspark.sql.functions import split, explode

sheshakespeareDF = sqlContext.read.text(fileName).select(removePunctuation(col('value')))

shakespeareDF.show(15, truncate=False)

The dataframe looks like this:

ss = split(shakespeareDF.sentence," ")
shakeWordsDFa =explode(ss)

shakeWordsDF_S=sqlContext.createDataFrame(shakeWordsDFa,'word')

Any idea what am I doing wrong? Tip says Column is not iterable.

What should I do? I just want to change shakeWordsDFa to dataframe and rename.

Upvotes: 0

Views: 6706

Answers (1)

Reputation: 330063

Just use select:

shakespeareDF = sc.parallelize([
    ("from fairest creatures we desire increase", ),
    ("that thereby beautys rose might never die", ),
]).toDF(["sentence"])

(shakespeareDF
    .select(explode(split("sentence", " ")).alias("word"))
    .show(4))

## +---------+
## |     word|
## +---------+
## |     from|
## |  fairest|
## |creatures|
## |       we|
## +---------+
## only showing top 4 rows

Spark SQL columns are not data structures. There are not bound to a data and are meaningful only when evaluated in a context of a specific DataFrame. This way Columns behave more like functions.

Upvotes: 3

Related Questions