petetheat
petetheat

Reputation: 89

Pandas groupby plot gives first plot twice

I have a dataframe with several categories and I want to use groupby to plot each category individually. However, the first category (or the first plot) is always plotted twice.

For example:

    import pandas as pd
    import numpy as np 
    import matplotlib.pyplot as plt

    n = 100000
    x = np.random.standard_normal(n)
    y1 = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
    y2 = 1.0 + 5.0 * x + 2.0 * np.random.standard_normal(n)

    df1 = pd.DataFrame({"A": x,
                        "B": y1})

    df2 =  pd.DataFrame({"A": x,
                         "B": y2})

    df1["Cat"] = "Cat1"
    df2["Cat"] = "Cat2"

    df = df1.append(df2, ignore_index=True)

    df.groupby("Cat").plot.hexbin(x="A", y="B",cmap = "jet")
    plt.show()

This will give me three plots, where Cat1 is plotted twice.

I just want two plots. What am I doing wrong?

Upvotes: 2

Views: 280

Answers (1)

Mathias711
Mathias711

Reputation: 6668

This is expected behaviour, see the warning in the docs:

Warning: In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

In your case, the plot function is called twice, which is visible in the result.

Upvotes: 1

Related Questions