Reputation: 998
Some quick load ins:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
di = sns.load_dataset('iris')
Using the example iris data set here. Create a scatterplot easily as follows:
sns.scatterplot(x=di['sepal_length'], y=di['sepal_width'],
hue=di['species']);
However with lmplot a TypeError is raised and requires the data argument. With the data argument fulfilled, it still does not work:
sns.lmplot(x=di['sepal_length'], y=di['sepal_width'],
hue=di['species'], data=di);
TypeError: '<' not supported between instances of 'str' and 'float'
However, this works just fine:
sns.lmplot(x='sepal_length', y='sepal_width', hue='species', data=di);
After reading the API reference, I see that lmplot requires the data argument, but scatterplot does not. Is there something different going on under the hood here? Also what are the best practices for the syntax here.
Upvotes: 3
Views: 3355
Reputation: 19885
The reason your code does not work is misuse of the data
argument. Where data
is passed, x
, y
and hue
will be treated as objects with which to index the object passed in data
, using its __getitem__
method. So, for example, x='sepal_length', y='sepal_width', data=di
is equivalent to x=di['sepal_length'], y=di['sepal_width']
Accordingly, this runs:
sns.lmplot(x='sepal_length', y='sepal_width', hue='species', data=di);
What you tried to do was basically equivalent to x=di[di['sepal_length']], y=di[di['sepal_width']], hue=di[di['species']]
.
Going back to the second part of your question regarding the difference between scatterplot
and lmplot
:
scatterplot
is an Axes
-level function; it relies only on matplotlib
's Axes
object, which, when plotting, can work with such varied collection types as lists
and np.ndarrays
. Functionally, more or less, it is the same as pyplot.scatter
with some default fancy colours.
On the other hand, lmplot
relies on sns.FacetGrid
(documentation available here). FacetGrid
is a purely sns
object that requires a pd.DataFrame
when constructed. Accordingly, therefore, for lmplot
to work, it must take a pd.DataFrame
.
Upvotes: 2