Reputation: 659
I'm trying to create a 2D line chart with seaborn, but I get several artefacts as seen here, i.e. lines that suddenly shoot down or up with barely-visible vertical lines:
Excel on the other hand produces a correct visualisation from the same file:
My code follows the seaborn examples (a sample test.csv
can be found here):
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1)
plt.show()
Am I doing something wrong, or is matplotlib unable to handle overlapping values?
Upvotes: 2
Views: 659
Reputation: 814
If you still want to use an estimator
to aggregate while plotting rather than in the data, you can use errorbar=None
to hide the vertical artefacts.
Something like:
plt.figure(figsize=(15,5))
sns.lineplot(
data=df_sample,
x='event_year_month',
y='incidents',
hue='person_id',
palette='tab10',
estimator='sum',
errorbar=None
)
plt.xticks(rotation=90)
plt.show()
Passing estimator=None
hides the error bars but also plots all the data points, which might not be what you want, rather an aggregation along the abscissa. Tested on 0.13.2.
Upvotes: 0
Reputation: 51
By default, Seaborn calculates the mean of multiple observations of the y variable at the same x level. This behaviour can be disabled/controlled using the estimator=None
parameter.
When adding this to the original code and data, we can observe that the artifacts are no longer present.
data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1, estimator=None)
plt.show()
Upvotes: 5
Reputation: 626
It seems that in your data some points have the same x values. line_plot
will see them as a single point with different samples, so it will compute the mean as the actual point and plot the error bar. The vertical artifacts are such error bars.
A hacky solution is adding a random tiny shift to your x values. In my case, I was trying to plot a PR curve and I encountered the same problem. I simply added an alternating shift to make sure there are no vertical segments:
precision, recall, unused_thresholds = sklearn.metrics.precision_recall_curve(
y_true, y_pred)
shift_recall = np.empty_like(recall)
shift_recall[::2] = shift
shift_recall[1::2] = -shift
line_plot = sns.lineplot(x=recall + shift_recall, y=precision)
Upvotes: 1