Reputation: 311
The data is a time series, with many member ids associated with many categories:
data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
'2018-09-14 00:01:46',
'2018-09-14 00:01:56',
'2018-09-14 00:01:57',
'2018-09-14 00:01:58',
'2018-09-14 00:02:05'],
'category': [1, 1, 1, 2, 2, 2],
'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
'data': ['23', '20', '20', '11', '16', '62']})
There are about 50 categories with 30 members, each with around 1000 datapoints.
I am trying to make one plot per category.
By subsetting each category then plotting via:
fig, ax = plt.subplots(figsize=(8,6))
for i, g in category.groupby(['memeber']):
g.plot(y='data', ax=ax, label=str(i))
plt.show()
This works fine for a single category, however, when i try to use a for loop to repeat this for each category, it does not work
tests = pd.DataFrame()
for category in categories:
tests = df.loc[df['category'] == category]
for test in tests:
fig, ax = plt.subplots(figsize=(8,6))
for i, g in category.groupby(['member']):
g.plot(y='data', ax=ax, label=str(i))
plt.show()
yields an "AttributeError: 'str' object has no attribute 'groupby'" error.
What i would like is a loop that spits out one graph per category, with all the members' data plotted on each graph
Upvotes: 1
Views: 221
Reputation: 648
Creating your dataframe
import pandas as pd
data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
'2018-09-14 00:01:46',
'2018-09-14 00:01:56',
'2018-09-14 00:01:57',
'2018-09-14 00:01:58',
'2018-09-14 00:02:05'],
'category': [1, 1, 1, 2, 2, 2],
'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
'data': ['23', '20', '20', '11', '16', '62']})
then [EDIT after comments]
import matplotlib.pyplot as plt
import numpy as np
subplots_n = np.unique(data_df['category']).size
subplots_x = np.round(np.sqrt(subplots_n)).astype(int)
subplots_y = np.ceil(np.sqrt(subplots_n)).astype(int)
for i, category in enumerate(data_df.groupby('category')):
category_df = pd.DataFrame(category[1])
x = [str(x) for x in category_df['member']]
y = [float(x) for x in category_df['data']]
plt.subplot(subplots_x, subplots_y, i+1)
plt.plot(x, y)
plt.title("Category {}".format(category_df['category'].values[0]))
plt.tight_layout()
plt.show()
yields to
Please note that this nicely takes care also of bigger groups like
data_df2 = pd.DataFrame({'category': [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5],
'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe', 'ric', 'mat', 'pip', 'zoe', 'qui', 'quo', 'qua'],
'data': ['23', '20', '20', '11', '16', '62', '34', '27', '12', '7', '9', '13', '7']})
Upvotes: 1
Reputation: 4547
Far from an expert with pandas, but if you execute the following simple enough snippet
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
'2018-09-14 00:01:46',
'2018-09-14 00:01:56',
'2018-09-14 00:01:57',
'2018-09-14 00:01:58',
'2018-09-14 00:02:05'],
'category': [1, 1, 1, 2, 2, 2],
'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots()
for item in df.groupby('category'):
ax.plot([float(x) for x in item[1]['category']],
[float(x) for x in item[1]['data'].values],
linestyle='none', marker='D')
plt.show()
But there is probably a better way.
EDIT: Based on the changes made to your question, I changed my snippet to
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
'2018-09-14 00:01:46',
'2018-09-14 00:01:56',
'2018-09-14 00:01:57',
'2018-09-14 00:01:58',
'2018-09-14 00:02:05'],
'category': [1, 1, 1, 2, 2, 2],
'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots(nrows=np.unique(df['category']).size)
for i, item in enumerate(df.groupby('category')):
ax[i].plot([str(x) for x in item[1]['Id']],
[float(x) for x in item[1]['data'].values],
linestyle='none', marker='D')
ax[i].set_title('Category {}'.format(item[1]['category'].values[0]))
fig.tight_layout()
plt.show()
which now displays
Upvotes: 1