Reputation: 1088

Summary of data for each month

I do have health diagnosis data for last year and I did like get count of diagnosis for each month. Here is my data:

import pandas as pd

cars2 = {'ID': [22,100,47,35,60],
    'Date': ['2020-04-11','2021-04-12','2020-05-13','2020-05-14', '2020-06-15'],
        'diagnosis': ['bacteria sepsis','bacteria sepsis','Sepsis','Risk sepsis','Neonatal sepsis'],
        'outcome': ['alive','alive','dead','alive','dead']
        
        }

df2 = pd.DataFrame(cars2, columns = ['ID','Date', 'diagnosis', 'outcome'])

print (df2)

How can I get diagnosis counts for each month. Example is how many diagnosis of bacteria sepsis we had for that month. Final result is a table showing value counts of diagnosis for each month

Upvotes: 0

Answers (4)

sophocles

Reputation: 13831

If you want to see results per month, you can use pivot_table.

df2.pivot_table(index=['outcome','diagnosis'], columns=pd.to_datetime(df2['Date']).dt.month, aggfunc='size', fill_value=0)

Date                     4  5  6
outcome diagnosis               
alive   Risk sepsis      0  1  0
        bacteria sepsis  2  0  0
dead    Neonatal sepsis  0  0  1
        Sepsis           0  1  0

4,5,6 are the months in your dataset.

Try playing around with the parameters here, you might be able to land on a better view that suits your ideal result better.

Upvotes: 2

SeaBean

Reputation: 23217

I think what you mean by for each month is not only mean month figure only, but year-month combination. As such, let's approach as follows:

First, we create a 'year-month' column according to the Date column. Then use .groupby() on this new year-month column and get .value_counts() on column diagnosis, as follows:

df2['year-month'] = pd.to_datetime(df2['Date']).dt.strftime("%Y-%m")
df2.groupby('year-month')['diagnosis'].value_counts().to_frame(name='Count').reset_index()

Result:

  year-month        diagnosis  Count
0    2020-04  bacteria sepsis      1
1    2020-05      Risk sepsis      1
2    2020-05           Sepsis      1
3    2020-06  Neonatal sepsis      1
4    2021-04  bacteria sepsis      1

Upvotes: 1

meowulf

Reputation: 377

I modified your dataframe by setting the Date column as index:

import pandas as pd

cars2 = {'ID': [22,100,47,35,60],
    'Date': ['2020-04-11','2021-04-12','2020-05-13','2020-05-14', '2020-06-15'],
        'diagnosis': ['bacteria sepsis','bacteria sepsis','Sepsis','Risk sepsis','Neonatal sepsis'],
        'outcome': ['alive','alive','dead','alive','dead']
        }

df2 = pd.DataFrame(cars2, columns = ['ID','Date', 'diagnosis', 'outcome'])
df2.index = pd.to_datetime(df2['Date']) # <--- I set your Date column as the index (also convert it to datetime)
df2.drop('Date',inplace=True, axis=1) # <--- Drop the Date column
print (df2)

if you groupby the dataframe by a pd.Grouper and the columns you want to group with (diagnosis and outcome):

df2.groupby([pd.Grouper(freq='M'), 'diagnosis','outcome']).count()

Output:

                                    ID
Date       diagnosis       outcome    
2020-04-30 bacteria sepsis alive     1
2020-05-31 Risk sepsis     alive     1
           Sepsis          dead      1
2020-06-30 Neonatal sepsis dead      1
2021-04-30 bacteria sepsis alive     1

Note: the freq='M' in pd.Grouper groups the dataframe by month. Read more about the freq attribute here

Edit: Assigning the grouped dataframe to new_df and resetting the other indices except Date:

new_df = df2.groupby([pd.Grouper(freq='M'), 'diagnosis','outcome']).count()
new_df.reset_index(level=[1,2],inplace=True)

Iterate over each month and get the table separately inside df_list:

df_list = [] # <--- this will contain each separate table for each month
for month in np.unique(new_df.index):
    df_list += [pd.DataFrame(new_df.loc[[month]])]
df_list[0] # <-- get the first dataframe in df_list

will return:

                  diagnosis outcome  ID
Date                                   
2020-04-30  bacteria sepsis   alive   1

Upvotes: 1

Ric S

Reputation: 9257

First you need to create a month variable through to_datetime() function; then you can group by the month and make a value_counts() within the month

import pandas as pd

df2['month'] = pd.to_datetime(df2['Date']).dt.month
df2.groupby('month').apply(lambda x: x['diagnosis'].value_counts())

month                 
4      bacteria sepsis    2
5      Risk sepsis        1
       Sepsis             1
6      Neonatal sepsis    1
Name: diagnosis, dtype: int64

Upvotes: 1

Summary of data for each month

Answers (4)

Related Questions