bbd108
bbd108

Reputation: 998

Differences in Seaborn's scatterplot and lmplot parameters

Some quick load ins:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

di = sns.load_dataset('iris')

Using the example iris data set here. Create a scatterplot easily as follows:

sns.scatterplot(x=di['sepal_length'], y=di['sepal_width'], 
            hue=di['species']);

However with lmplot a TypeError is raised and requires the data argument. With the data argument fulfilled, it still does not work:

sns.lmplot(x=di['sepal_length'], y=di['sepal_width'], 
            hue=di['species'], data=di);

TypeError: '<' not supported between instances of 'str' and 'float'

However, this works just fine:

sns.lmplot(x='sepal_length', y='sepal_width', hue='species', data=di);

After reading the API reference, I see that lmplot requires the data argument, but scatterplot does not. Is there something different going on under the hood here? Also what are the best practices for the syntax here.

Upvotes: 3

Views: 3355

Answers (1)

gmds
gmds

Reputation: 19885

The reason your code does not work is misuse of the data argument. Where data is passed, x, y and hue will be treated as objects with which to index the object passed in data, using its __getitem__ method. So, for example, x='sepal_length', y='sepal_width', data=di is equivalent to x=di['sepal_length'], y=di['sepal_width']

Accordingly, this runs:

sns.lmplot(x='sepal_length', y='sepal_width', hue='species', data=di);

What you tried to do was basically equivalent to x=di[di['sepal_length']], y=di[di['sepal_width']], hue=di[di['species']].

Going back to the second part of your question regarding the difference between scatterplot and lmplot:

scatterplot is an Axes-level function; it relies only on matplotlib's Axes object, which, when plotting, can work with such varied collection types as lists and np.ndarrays. Functionally, more or less, it is the same as pyplot.scatter with some default fancy colours.

On the other hand, lmplot relies on sns.FacetGrid (documentation available here). FacetGrid is a purely sns object that requires a pd.DataFrame when constructed. Accordingly, therefore, for lmplot to work, it must take a pd.DataFrame.

Upvotes: 2

Related Questions