Reputation: 76
I am trying to run some python pyspark script on Dataproc cluster but getting failed with below error:
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 815, in join
if isinstance(on[0], basestring):
IndexError: list index out of range
The syntax, I am using in my code is: -
df1.join(df2, col1)
Any ideas?
Upvotes: 0
Views: 92
Reputation: 10677
Looking at the code, on
is the "col1" argument you're passing in, and the code in Spark assumes that if on is not None
that it definitely has at least one element. Is it possible that you're accidentally passing in an empty array for col1
? Perhaps you can print out col1
before calling join
to make sure.
Upvotes: 1