Reputation: 21
I have created a dataframe that looks like the following:
I have no problems plotting the data with the following:
df_catch.plot(x='YY', y='ALB_C', kind='scatter',
figsize=(12,6), title='ALB catch/hooks')
plt.xlabel('Year')
plt.ylabel('ALB catch/hooks')
plt.show()
There are many rows of data from many months and years. I want to concatenate the data down just years (i.e. sum the month data per year). I do this with the following:
name = df_catch.groupby('YY')
# Apply the sum function to the groupby object
df_year = name.sum()
df_year.head(5)
And this yields mostly the expected results except the YY data are now the index and any thing I try to do to get a similar scatter plot throws errors.
Question 1. Is there an elegant way to do the sums on they year data without getting the YY data as an new index. Also note that the way I am doing this I get sums of all data columns like Latitude and Longitude which I would like to avoid.
Question 2. If you do have one of your data variables as the index, how do you do a scatter plot similar to the first code snippet above. I was able to get a line plot using the code below but it is really not what I want.
plt.plot(df_year.index, df_year['ALB_C'])
Thanks very much in advance for your help. I am really new to python/pandas but like what the functionality, I did go through the question search to find the answer and I have looked at tutorials on line. Again thanks.
Upvotes: 2
Views: 158
Reputation: 153460
Question 1: Let's try
name = df_catch.groupby('YY', as_index=False)
or
name.sum().reset_index()
Question 2: Let's do this
plt.plot(df_year.index, df_year['ALB_C'], marker="o", linestyle='none')
Upvotes: 1
Reputation: 862471
For convert index
to column are 2 solutions:
Need reset_index
:
name = df_catch.groupby('YY')
# Apply the sum function to the groupby object
df_year = name.sum().reset_index()
df_year.head(5)
Or add parameter as_index=False
to groupby
:
name = df_catch.groupby('YY', as_index=False)
# Apply the sum function to the groupby object
df_year = name.sum()
df_year.head(5)
Upvotes: 1