isaacsultan
isaacsultan

Reputation: 3974

Plot boxplot and line from pandas

I am trying to reproduce this graph - a line plot with a boxplot at every point:

Imgur

However, the line plot is always starting at the origin instead of at the first x tick:

Imgur

I have collected my datastructure in a pandas file, with each column header the k_e (of the x axis), with the column being all of the datapoints.

I am plotting the mean of each column and the boxplot like so:

df = df.astype(float)

_, ax = plt.subplots()
df.mean().plot(ax = ax)
df.boxplot(showfliers=False, ax=ax)

plt.xlabel(r'$k_{e}$')
plt.ylabel('Test error rate')
plt.title(r'Accuracies with different $k_{e}$')

plt.show()

I have referred to the link below, and so am passing the 'ax' position but this does not help.

plot line over boxplot using pandas DateFrame

EDIT: Here is a minimal example:

test_errors_dict = dict() np.random.seed(40)

test_errors_dict[2] = np.random.rand(20)
test_errors_dict[3] = np.random.rand(20)
test_errors_dict[5] = np.random.rand(20)

df = pd.DataFrame(data=test_errors_dict)
df = df.astype(float)

_, ax = plt.subplots()
df.mean().plot(ax=ax)
df.boxplot(showfliers=False, ax=ax)

plt.show()

Result: Imgur

As shown in the above, the line plots do not align with the boxplot

Upvotes: 0

Views: 2572

Answers (1)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339765

The boxes are at positions 1,2,3, while the plot is at positions 2,3,5. You may reindex the mean Series to also use the positions 1,2,3.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

test_errors_dict = dict()
np.random.seed(40)

test_errors_dict[2] = np.random.rand(20)
test_errors_dict[3] = np.random.rand(20)
test_errors_dict[5] = np.random.rand(20)

df = pd.DataFrame(data=test_errors_dict)
df = df.astype(float)

mean = df.mean()
mean.index = np.arange(1,len(mean)+1)

_, ax = plt.subplots()
mean.plot(ax=ax)
df.boxplot(showfliers=False, ax=ax)

plt.show()

enter image description here

Upvotes: 1

Related Questions