Reputation: 884
I have a time series data set of multiple IDs and multiple variables, each variable has 3 time series entries - "baseline", "3 month", "6 month". The dataframe is structured like this, df =
import pandas as pd
data = {'Patient ID': [11111, 11111, 11111, 11111, 22222, 22222, 22222, 22222, 33333, 33333, 33333, 33333, 44444, 44444, 44444, 44444, 55555, 55555, 55555, 55555],
'Lab Attribute': ['% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)'],
'Baseline': [46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0],
'3 Month': [23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0],
'6 Month': [34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0]}
df = pd.DataFrame(data)
Patient ID Lab Attribute Baseline 3 Month 6 Month
0 11111 % Saturation- Iron 46.0 23.0 34.0
1 11111 ALK PHOS 94.0 82.0 65.0
2 11111 ALT(SGPT) 21.0 13.0 10.0
3 11111 AST (SGOT) 18.0 17.0 14.0
4 22222 % Saturation- Iron 46.0 23.0 34.0
5 22222 ALK PHOS 94.0 82.0 65.0
6 22222 ALT(SGPT) 21.0 13.0 10.0
7 22222 AST (SGOT) 18.0 17.0 14.0
8 33333 % Saturation- Iron 46.0 23.0 34.0
9 33333 ALK PHOS 94.0 82.0 65.0
10 33333 ALT(SGPT) 21.0 13.0 10.0
11 33333 AST (SGOT) 18.0 17.0 14.0
12 44444 % Saturation- Iron 46.0 23.0 34.0
13 44444 ALK PHOS 94.0 82.0 65.0
14 44444 ALT(SGPT) 21.0 13.0 10.0
15 44444 AST (SGOT) 18.0 17.0 14.0
16 55555 % Saturation- Iron 46.0 23.0 34.0
17 55555 ALK PHOS 94.0 82.0 65.0
18 55555 ALT(SGPT) 21.0 13.0 10.0
19 55555 AST (SGOT) 18.0 17.0 14.0
What I'm trying to do is group by the ID and Lab Attribute and create a plot of each "Lab Attribute" - % Saturation- Iron, ALK PHOS, etc., that will include the time series for all of the Patient IDs.
So based on the example data there would be 4 plots - % Saturation- Iron, ALK PHOS, etc., that each would contain 5 traces (1 for each ID).
I tried using groupby per this article - Creating a time-series plot with data in long format in python?
Though it just plots everything on one chart.
This is the code I have so far:
df_labs = pd.read_csv("/Users/johnconor/Documents/Python/gut_microbiome/out/nw_labs_up_to_6mon.csv")
df_labs = df_labs.fillna(method='ffill')
dfl = df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().plot()
This was the outcome:
[![enter image description here][1]][1]
Part of the problem I'm having is that all of the examples I can find have long format data with only 1 values column. Not a number of values over time.
I also tried to utilize the approach for multiple plots per this article - Creating a time-series plot with data in long format in python?
n_ids = df_labs.Patient_ID.unique().size
n_cols = int(n_ids ** 0.5)
n_rows = n_cols + (1 if n_ids % n_cols else 0)
fig, axes = plt.subplots(n_rows, n_cols)
axes = axes.ravel()
for i, (id, att, base,three,six) in enumerate(df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().reset_index()):
print(idx)
series.plot(ax=axes[i], title=f"ID:{idx}")
fig.tight_layout()
Though am running into issues as it is again designed for only 1 set of values. Producing the error:
ValueError: too many values to unpack (expected 5)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-40-42cf5bc14bdb> in <module>
4 fig, axes = plt.subplots(n_rows, n_cols)
5 axes = axes.ravel()
----> 6 for i, (id, att, base,three,six) in enumerate(df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().reset_index()):
7 print(idx)
8 series.plot(ax=axes[i], title=f"ID:{idx}")
ValueError: too many values to unpack (expected 5)
Upvotes: 2
Views: 304
Reputation: 62513
.melt
, which will allow the month to be used as a time axis.seaborn.relplot
with kind='line'
, to create the visualization.
col
, row
, and/or hue
to adjust how the data should be grouped. Do not change x
and y
.y
, see Prevent Sharing of Y Axes in Seaborn Relplotimport pandas as pd
import seaborn as sns
# reshape the dataframe
dfm = df.melt(id_vars=['Patient ID', 'Lab Attribute'], var_name='Months')
# change the Months values to numeric
dfm.Months = dfm.Months.map({'Baseline': 0, '3 Month': 3, '6 Month': 6})
# display(dfm.head())
Patient ID Lab Attribute Months value
0 11111 % Saturation- Iron 0 46.0
1 11111 ALK PHOS 0 94.0
2 11111 ALT(SGPT) 0 21.0
3 11111 AST (SGOT) 0 18.0
4 22222 % Saturation- Iron 0 46.0
# plot a figure level line plot with seaborn
p = sns.relplot(data=dfm, col='Lab Attribute', x='Months', y='value', hue='Patient ID', kind='line', col_wrap=4, marker='o', palette='husl')
seaborn.catplot
with kind='bar'
for a bar plot visualizationp = sns.catplot(data=dfm, col='Lab Attribute', x='Months', y='value', hue='Patient ID', kind='bar', col_wrap=4)
Upvotes: 3