Mikej
Mikej

Reputation: 21

Pandas data frames and matplotlib.pyplot

I have created a dataframe that looks like the following:

Data frame #1

I have no problems plotting the data with the following:

df_catch.plot(x='YY', y='ALB_C', kind='scatter', 
        figsize=(12,6), title='ALB catch/hooks')
plt.xlabel('Year')
plt.ylabel('ALB catch/hooks')
plt.show()

There are many rows of data from many months and years. I want to concatenate the data down just years (i.e. sum the month data per year). I do this with the following:

name = df_catch.groupby('YY')
# Apply the sum function to the groupby object
df_year = name.sum()
df_year.head(5)

And this yields mostly the expected results except the YY data are now the index and any thing I try to do to get a similar scatter plot throws errors.

Summed data

Question 1. Is there an elegant way to do the sums on they year data without getting the YY data as an new index. Also note that the way I am doing this I get sums of all data columns like Latitude and Longitude which I would like to avoid.

Question 2. If you do have one of your data variables as the index, how do you do a scatter plot similar to the first code snippet above. I was able to get a line plot using the code below but it is really not what I want.

plt.plot(df_year.index, df_year['ALB_C'])

Thanks very much in advance for your help. I am really new to python/pandas but like what the functionality, I did go through the question search to find the answer and I have looked at tutorials on line. Again thanks.

Upvotes: 2

Views: 158

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Question 1: Let's try

name = df_catch.groupby('YY', as_index=False)

or

name.sum().reset_index()

Question 2: Let's do this

plt.plot(df_year.index, df_year['ALB_C'], marker="o", linestyle='none')

Upvotes: 1

jezrael
jezrael

Reputation: 862471

For convert index to column are 2 solutions:

Need reset_index:

name = df_catch.groupby('YY')
# Apply the sum function to the groupby object
df_year = name.sum().reset_index()
df_year.head(5)

Or add parameter as_index=False to groupby:

name = df_catch.groupby('YY', as_index=False)
# Apply the sum function to the groupby object
df_year = name.sum()
df_year.head(5)

Upvotes: 1

Related Questions