Indika Rajapaksha
Indika Rajapaksha

Reputation: 1168

Align pandas dataframes as panels

I have 12 dataframes of the same shape for 12 years of data collection. I need to use this as a panel to to plot the various column values across the time series axis (years). Hence, I think I should align these frames as panels.

  1. Can someone help me on how to align dataframes as panels?
  2. Is this the correct way to do this to prepare for plotting along 3rd dimension?

enter image description here

Some sample data:

# for 2015
Grave Crimes    Cases Recorded  Mistake of Law fact
Abduction       725             3
Kidnapping      246             6
Arson           466             1
Mischief        436             1
House Breaking  12707           21
Grievous Hurt   1299            3

# for 2016
Grave Crimes    Cases Recorded  Mistake of Law fact
Abduction       738             4
Kidnapping      297             9
Arson           486             4
Mischief        394             1
House Breaking  10287           14
Grievous Hurt   1205            0

# for 2017
Grave Crimes    Cases Recorded  Mistake of Law fact
Abduction       647             2
Kidnapping      251             10
Arson           418             3
Mischief        424             0
House Breaking  8913            12
Grievous Hurt   1075            1

Upvotes: 2

Views: 573

Answers (2)

SpghttCd
SpghttCd

Reputation: 10860

Assuming your DataFrames are named like df15, df16, df17, you could create a panel with them like:

pnl = pd.Panel({2015: df15, 2016: df16, 2017: df17})

After that, you could do the 3D-plot you mentioned in your question thge following way:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

for i in range(2015, 2018):
    ax.bar(pnl.major_axis.values, pnl[i]['Cases Recorded'], zdir='y', zs=i)

ax.yaxis.set_ticks(range(2015, 2018))
ax.yaxis.set_ticklabels(range(2015, 2018))

example of a 3D-plot of your data

However, if I may give you a hint with respect to well readable data visualization from my own experience, which I think many professionals would share:

Even if a dataset is 3- or more-dimensional structured, it is often a good choice to create a well designed 2-d plot. 3D might often be an eye catcher, but to inform the target audience and to show certain properties of the data, you'll nearly almost go with 2d. Having this in mind, the approach of Ami Tavory would be the better way to go, as the data structure is then easier to handle:

df15['year'] = 2015
df16['year'] = 2016
df17['year'] = 2017
df = pd.concat([df15, df16, df17]).set_index(['Grave Crimes', 'year'])

f, ax = plt.subplots(1)
for i, y in enumerate(range(2015, 2018)):
    data = df.groupby('year').get_group(y)['Cases Recorded']
    ax.bar(np.arange(6)+.2*i, data.values, width=.2, label=str(y))
ax.legend()
ax.set_xticklabels(data.index, rotation=15)

example for 2D-plot of your data

Upvotes: 1

Ami Tavory
Ami Tavory

Reputation: 76297

While panels allow adding dimensions, hierarchical indexing is a more common replacement. E.g., from Python Data Science Handbook:

While Pandas does provide Panel and Panel4D objects that natively handle three-dimensional and four-dimensional data (see Aside: Panel Data), a far more common pattern in practice is to make use of hierarchical indexing (also known as multi-indexing) to incorporate multiple index levels within a single index. In this way, higher-dimensional data can be compactly represented within the familiar one-dimensional Series and two-dimensional DataFrame objects.

In your case

I have 12 dataframes of the same shape for 12 years of data collection. I need to use this as a panel to to plot the various column values across the time series axis (years).

Say your DataFrames are in df_2015, df_2016 and df_2017. You can do the following:

df_2015['year'] = 2015
df_2016['year'] = 2016
df_2017['year'] = 2017
df = pd.concat([df_2015, df_2016, df_2017]).set_index(['Grave Crimes', 'year'])

Now to get the data across all years for 'Abduction', for example, use

df[df.index.get_level_values(0) == 'Abduction']

Upvotes: 1

Related Questions