Pyspark Dropping the header in a dataframe, AttributeError: _jdf

Question

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

spark = sqlContext.sparkSession
avg_calc = spark.read.csv("quiz2_algo.csv", header= True,inferSchema=True)
header = avg_calc.first()
no_header = avg_calc.subtract(header)
no_header

avg_calc contains 2 columns and I am trying to remove the 1st row from both columns, however I am receiving the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in ()
----> 1 no_header = avg_calc.subtract(header)

C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql\dataframe.pyc in subtract(self, other)
   1391 
   1392         """
-> 1393         return DataFrame(getattr(self._jdf, "except")(other._jdf), self.sql_ctx)
   1394 
   1395     @since(1.4)

C:\spark\spark-2.3.0-bin-hadoop2.7\python\pyspark\sql	ypes.pyc in __getattr__(self, item)
   1559             raise AttributeError(item)
   1560         except ValueError:
-> 1561             raise AttributeError(item)
   1562 
   1563     def __setattr__(self, key, value):

AttributeError: _jdf

If anyone can help I would appreciate it!

Example of the data: avg_calc.show(5)

Pyspark Dropping the header in a dataframe, AttributeError: _jdf

Answers (1)

Related Questions