SebiH
SebiH

Reputation: 659

Vertical line artefacts in 2D lineplot

I'm trying to create a 2D line chart with seaborn, but I get several artefacts as seen here, i.e. lines that suddenly shoot down or up with barely-visible vertical lines: borked lineplot

Excel on the other hand produces a correct visualisation from the same file: correct lineplot

My code follows the seaborn examples (a sample test.csv can be found here):

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1)
plt.show()

Am I doing something wrong, or is matplotlib unable to handle overlapping values?

Upvotes: 2

Views: 659

Answers (3)

Rafs
Rafs

Reputation: 814

If you still want to use an estimator to aggregate while plotting rather than in the data, you can use errorbar=None to hide the vertical artefacts.

Something like:

plt.figure(figsize=(15,5))
sns.lineplot(
    data=df_sample,
    x='event_year_month',
    y='incidents',
    hue='person_id',
    palette='tab10',
    estimator='sum',
    errorbar=None
)
plt.xticks(rotation=90)
plt.show()

Passing estimator=None hides the error bars but also plots all the data points, which might not be what you want, rather an aggregation along the abscissa. Tested on 0.13.2.

Upvotes: 0

Adam
Adam

Reputation: 51

By default, Seaborn calculates the mean of multiple observations of the y variable at the same x level. This behaviour can be disabled/controlled using the estimator=None parameter.

When adding this to the original code and data, we can observe that the artifacts are no longer present.

data = pd.read_csv('test.csv')
sns.set()
lp = sns.lineplot(x=data['x'], y=data['y'], sort=False, lw=1, estimator=None)
plt.show()

Output

Upvotes: 5

Dawei Yang
Dawei Yang

Reputation: 626

It seems that in your data some points have the same x values. line_plot will see them as a single point with different samples, so it will compute the mean as the actual point and plot the error bar. The vertical artifacts are such error bars.

A hacky solution is adding a random tiny shift to your x values. In my case, I was trying to plot a PR curve and I encountered the same problem. I simply added an alternating shift to make sure there are no vertical segments:

  precision, recall, unused_thresholds = sklearn.metrics.precision_recall_curve(
      y_true, y_pred)

  shift_recall = np.empty_like(recall)
  shift_recall[::2] = shift
  shift_recall[1::2] = -shift

  line_plot = sns.lineplot(x=recall + shift_recall, y=precision)

Before the fix: PR Curve with vertical artifacts

After the fix: PR Curve without vertical artifacts

Upvotes: 1

Related Questions