Reputation: 1088
I do have health diagnosis data for last year and I did like get count of diagnosis for each month. Here is my data:
import pandas as pd
cars2 = {'ID': [22,100,47,35,60],
'Date': ['2020-04-11','2021-04-12','2020-05-13','2020-05-14', '2020-06-15'],
'diagnosis': ['bacteria sepsis','bacteria sepsis','Sepsis','Risk sepsis','Neonatal sepsis'],
'outcome': ['alive','alive','dead','alive','dead']
}
df2 = pd.DataFrame(cars2, columns = ['ID','Date', 'diagnosis', 'outcome'])
print (df2)
How can I get diagnosis counts for each month. Example is how many diagnosis of bacteria sepsis we had for that month. Final result is a table showing value counts of diagnosis for each month
Upvotes: 0
Views: 346
Reputation: 13831
If you want to see results per month, you can use pivot_table
.
df2.pivot_table(index=['outcome','diagnosis'], columns=pd.to_datetime(df2['Date']).dt.month, aggfunc='size', fill_value=0)
Date 4 5 6
outcome diagnosis
alive Risk sepsis 0 1 0
bacteria sepsis 2 0 0
dead Neonatal sepsis 0 0 1
Sepsis 0 1 0
4,5,6 are the months in your dataset.
Try playing around with the parameters here, you might be able to land on a better view that suits your ideal result better.
Upvotes: 2
Reputation: 23217
I think what you mean by for each month is not only mean month figure only, but year-month combination. As such, let's approach as follows:
First, we create a 'year-month' column according to the Date
column. Then use .groupby()
on this new year-month
column and get .value_counts()
on column diagnosis
, as follows:
df2['year-month'] = pd.to_datetime(df2['Date']).dt.strftime("%Y-%m")
df2.groupby('year-month')['diagnosis'].value_counts().to_frame(name='Count').reset_index()
Result:
year-month diagnosis Count
0 2020-04 bacteria sepsis 1
1 2020-05 Risk sepsis 1
2 2020-05 Sepsis 1
3 2020-06 Neonatal sepsis 1
4 2021-04 bacteria sepsis 1
Upvotes: 1
Reputation: 377
I modified your dataframe by setting the Date
column as index:
import pandas as pd
cars2 = {'ID': [22,100,47,35,60],
'Date': ['2020-04-11','2021-04-12','2020-05-13','2020-05-14', '2020-06-15'],
'diagnosis': ['bacteria sepsis','bacteria sepsis','Sepsis','Risk sepsis','Neonatal sepsis'],
'outcome': ['alive','alive','dead','alive','dead']
}
df2 = pd.DataFrame(cars2, columns = ['ID','Date', 'diagnosis', 'outcome'])
df2.index = pd.to_datetime(df2['Date']) # <--- I set your Date column as the index (also convert it to datetime)
df2.drop('Date',inplace=True, axis=1) # <--- Drop the Date column
print (df2)
if you groupby
the dataframe by a pd.Grouper and the columns you want to group with (diagnosis
and outcome
):
df2.groupby([pd.Grouper(freq='M'), 'diagnosis','outcome']).count()
Output:
ID
Date diagnosis outcome
2020-04-30 bacteria sepsis alive 1
2020-05-31 Risk sepsis alive 1
Sepsis dead 1
2020-06-30 Neonatal sepsis dead 1
2021-04-30 bacteria sepsis alive 1
Note: the freq='M'
in pd.Grouper
groups the dataframe by month. Read more about the freq
attribute here
Edit: Assigning the grouped dataframe to new_df
and resetting the other indices except Date
:
new_df = df2.groupby([pd.Grouper(freq='M'), 'diagnosis','outcome']).count()
new_df.reset_index(level=[1,2],inplace=True)
Iterate over each month and get the table separately inside df_list
:
df_list = [] # <--- this will contain each separate table for each month
for month in np.unique(new_df.index):
df_list += [pd.DataFrame(new_df.loc[[month]])]
df_list[0] # <-- get the first dataframe in df_list
will return:
diagnosis outcome ID
Date
2020-04-30 bacteria sepsis alive 1
Upvotes: 1
Reputation: 9257
First you need to create a month
variable through to_datetime()
function; then you can group by the month and make a value_counts()
within the month
import pandas as pd
df2['month'] = pd.to_datetime(df2['Date']).dt.month
df2.groupby('month').apply(lambda x: x['diagnosis'].value_counts())
month
4 bacteria sepsis 2
5 Risk sepsis 1
Sepsis 1
6 Neonatal sepsis 1
Name: diagnosis, dtype: int64
Upvotes: 1