Reputation: 2189
I have a data frame with three columns and I am trying to do a line plot using Seaborn library but it throws me an error saying that 'DataFrame' object has no attribute 'get'
. Here is my test data frame
Age variable value
31 Overall 69.76751118
31 Potential 69.76751118
31 Growth 0
34 Overall 68.91176471
34 Potential 68.91176471
34 Growth 0
28 Overall 69.05803996
28 Potential 69.05803996
28 Growth 0.24643197
This is what I am trying to do using the seaborn line plot after reading in the csv file
test = spark.read.csv("test.csv", inferSchema=True, header=True)
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test)
And the error that I get is this
AttributeError: 'DataFrame' object has no attribute 'get'
However when I convert the data frame to Pandas data frame and use exactly the same seaborn code it works
test_df = test.toPandas()
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test_df)
Am I doing anything wrong with Spark Data frames.
Upvotes: 10
Views: 18330
Reputation: 3469
A spark dataframe and a pandas dataframe, despite sharing a lot of the same functionalities, differ on where and how they allocate data.
This step is correct:
test_df = test.toPandas()
You will always need to collect the data before you can use it to plot with seaborn (or even matplotlib)
Upvotes: 10