Reputation: 89
I have a dataframe with several categories and I want to use groupby
to plot each category individually. However, the first category (or the first plot) is always plotted twice.
For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
n = 100000
x = np.random.standard_normal(n)
y1 = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
y2 = 1.0 + 5.0 * x + 2.0 * np.random.standard_normal(n)
df1 = pd.DataFrame({"A": x,
"B": y1})
df2 = pd.DataFrame({"A": x,
"B": y2})
df1["Cat"] = "Cat1"
df2["Cat"] = "Cat2"
df = df1.append(df2, ignore_index=True)
df.groupby("Cat").plot.hexbin(x="A", y="B",cmap = "jet")
plt.show()
This will give me three plots, where Cat1 is plotted twice.
I just want two plots. What am I doing wrong?
Upvotes: 2
Views: 280
Reputation: 6668
This is expected behaviour, see the warning in the docs:
Warning: In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.
In your case, the plot function is called twice, which is visible in the result.
Upvotes: 1