upendra
upendra

Reputation: 2189

How do I create a seaborn line plot for PySpark dataframe?

I have a data frame with three columns and I am trying to do a line plot using Seaborn library but it throws me an error saying that 'DataFrame' object has no attribute 'get'. Here is my test data frame

Age variable    value
31  Overall 69.76751118
31  Potential   69.76751118
31  Growth  0
34  Overall 68.91176471
34  Potential   68.91176471
34  Growth  0
28  Overall 69.05803996
28  Potential   69.05803996
28  Growth  0.24643197

This is what I am trying to do using the seaborn line plot after reading in the csv file

test = spark.read.csv("test.csv", inferSchema=True, header=True)
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test)

And the error that I get is this

AttributeError: 'DataFrame' object has no attribute 'get'

However when I convert the data frame to Pandas data frame and use exactly the same seaborn code it works

test_df = test.toPandas()
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test_df)

enter image description here

Am I doing anything wrong with Spark Data frames.

Upvotes: 10

Views: 18330

Answers (1)

Maviles
Maviles

Reputation: 3469

A spark dataframe and a pandas dataframe, despite sharing a lot of the same functionalities, differ on where and how they allocate data.

This step is correct:

test_df = test.toPandas()

You will always need to collect the data before you can use it to plot with seaborn (or even matplotlib)

Upvotes: 10

Related Questions