Farhad
Farhad

Reputation: 147

I'm encountering Pyspark Error: Column is not iterable

When I try to run follwing code in spark I get the error:

Here is traceback:

TypeError                                 Traceback (most recent call last)
<ipython-input-33-4bfff78eeaad> in <module>()
  1 feature_text='lepton pT, lepton eta, lepton phi, missing energy 
magnitude, missing energy phi, jet 1 pt, jet 1 eta, jet 1 phi, jet 1 b-tag, 
jet 2 pt, jet 2 eta, jet 2 phi, jet 2 b-tag, jet 3 pt, jet 3 eta, jet 3 phi, 
jet 3 b-tag, jet 4 pt, jet 4 eta, jet 4 phi, jet 4 b-tag, m_jj, m_jjj, m_lv, 
m_jlv, m_bb, m_wbb, m_wwbb'
----> 2 features=[strip(a) for a in split(feature_text,',')]

/opt/ibm/spark/python/pyspark/sql/column.py in __iter__(self)
342 
343     def __iter__(self):
--> 344         raise TypeError("Column is not iterable")
345 
346     # string methods

TypeError: Column is not iterable

code:

feature_text='lepton pT, lepton eta, lepton phi, missing energy magnitude, missing energy phi, jet 1pt, jet 1 eta, jet 1 phi, jet 1 b-tag, jet 2 pt, jet 2 eta, jet 2 phi, jet 2 b-tag, jet 3 pt, jet 3 eta, jet 3 phi, jet 3 b-tag, jet 4 pt, jet 4 eta, jet 4 phi, jet 4 b-tag, m_jj, m_jjj, m_lv, m_jlv, m_bb, m_wbb, m_wwbb'

features=[strip(a) for a in split(feature_text,',')]

Upvotes: 0

Views: 237

Answers (1)

dspencer
dspencer

Reputation: 4481

It looks like you are using the pyspark.sql.functions.split function, when you're really looking for the string split method.

Using the latter, you could generate a list of feature names, without requiring a list comprehension, from your string using:

features = feature_text.split(", ")

Both strip and split are methods of str, rather than functions, so are called using my_string.strip(" ") and my_string.split(",").

Upvotes: 1

Related Questions