Glenn Becker
Glenn Becker

Reputation: 33

pandas: plot mean values within bins - formatting help needed

I am doing some solar system dynamics simulations, and have been using this project as an excuse to teach myself some python/pandas. The resulting data set has a little over 1000 records, with values for orbital inclination, eccentricity and so on for each of the bodies involved.

I'm trying to use pandas to study the mean orbital inclinations of the ~1000 bodies ('test particles') in the result data, as a function of semi-major axis.

So far, what I've been doing is this:

1) read the data into a dataframe

df = pd.read_csv('final.csv')

2) limit the data to a range of semi-major axis values (the particles 'spread out' over the course of the simulations, but I want to limit my analysis

cf = df[df.a.between(30,80)]

3) plot the mean value for inclination for a given number of bins

cf.groupby(pd.cut(cf.a, 80))['inc'].mean().plot()

This creates an acceptable plot, but formatting-wise it has a couple of problems: unless it's completely maximized to fill my screen, the numbers along the x axis get squished together and overlap. They are also not exactly what I'd like to see: they show the max and min for bins, where I would prefer a straight ticking by 5s or something similar.

pandas output from above command

I've tried passing values (x=None, xticks=None) in to the plot() part of the string above, but this has had no effect on the resulting plot. Is plot/output control possible with the way I'm doing this?

Thanks,

G

Upvotes: 3

Views: 4722

Answers (1)

KPLauritzen
KPLauritzen

Reputation: 1869

When I want to do something like this, I go to matplotlib directly. I will show a small example with this sample data:

df = pd.DataFrame([[1, 2], [2, 7], [3, 6], [4,7], [5,3]], columns=['A', 'B'])

Instead of cutting it with pd.cut, I make cuts with np.linspace. So

bins = np.linspace(0,5,4)
group = df.groupby(pd.cut(df.A, bins))

Now, to plot it, I want the middle of the bins

plot_centers = (bins [:-1] + bins [1:])/2
plot_values = group.B.mean()

and plot with

plt.plot(plot_centers, plot_values)

You should be careful handling missing data, ie if you have a bin with no data in it. In that case you can use fillna(0), to make all NaNs 0.

plot_values = group.B.mean().fillna(0)

Upvotes: 6

Related Questions