Reputation: 1
from pyspark.sql.functions import split, explode
sheshakespeareDF = sqlContext.read.text(fileName).select(removePunctuation(col('value')))
shakespeareDF.show(15, truncate=False)
The dataframe looks like this:
ss = split(shakespeareDF.sentence," ")
shakeWordsDFa =explode(ss)
shakeWordsDF_S=sqlContext.createDataFrame(shakeWordsDFa,'word')
Any idea what am I doing wrong? Tip says Column is not iterable
.
What should I do? I just want to change shakeWordsDFa
to dataframe and rename.
Upvotes: 0
Views: 6706
Reputation: 330063
Just use select:
shakespeareDF = sc.parallelize([
("from fairest creatures we desire increase", ),
("that thereby beautys rose might never die", ),
]).toDF(["sentence"])
(shakespeareDF
.select(explode(split("sentence", " ")).alias("word"))
.show(4))
## +---------+
## | word|
## +---------+
## | from|
## | fairest|
## |creatures|
## | we|
## +---------+
## only showing top 4 rows
Spark SQL columns are not data structures. There are not bound to a data and are meaningful only when evaluated in a context of a specific DataFrame
. This way Columns
behave more like functions.
Upvotes: 3