Pandas groupby plot gives first plot twice

Question

I have a dataframe with several categories and I want to use groupby to plot each category individually. However, the first category (or the first plot) is always plotted twice.

For example:

    import pandas as pd
    import numpy as np 
    import matplotlib.pyplot as plt

    n = 100000
    x = np.random.standard_normal(n)
    y1 = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
    y2 = 1.0 + 5.0 * x + 2.0 * np.random.standard_normal(n)

    df1 = pd.DataFrame({"A": x,
                        "B": y1})

    df2 =  pd.DataFrame({"A": x,
                         "B": y2})

    df1["Cat"] = "Cat1"
    df2["Cat"] = "Cat2"

    df = df1.append(df2, ignore_index=True)

    df.groupby("Cat").plot.hexbin(x="A", y="B",cmap = "jet")
    plt.show()

This will give me three plots, where Cat1 is plotted twice.

I just want two plots. What am I doing wrong?

Mathias711 · Accepted Answer

This is expected behaviour, see the warning in the docs:

Warning: In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In your case, the plot function is called twice, which is visible in the result.

Pandas groupby plot gives first plot twice

Answers (1)

Related Questions