Sylv99
Sylv99

Reputation: 185

How to transform a wide dataframe to plot specific data

I have a dataframe as below:

import pandas as pd

# sample data
# sample data
data = {'RowName': ['A1', 'A2', 'A3'], '20/09/21 (LP%)': [120, 0, 0], '20/09/21 (Vol)': [13, 1, 31], '20/09/21 (OI)': [0, 0, 0], '21/09/21 (LP%)': [135.0, 143.43, 143.43], '21/09/21 (Vol)': [68.6, 63.81, 58.1], '21/09/21 (OI)': [0, 0, 0], '22/09/21 (LP%)': [130, 0, 0], '22/09/21 (Vol)': [0, 0, 0], '22/09/21 (OI)': [75, 80, 85]}
df = pd.DataFrame(data)

# display(df)
  RowName  20/09/21 (LP%)  20/09/21 (Vol)  20/09/21 (OI)  21/09/21 (LP%)  21/09/21 (Vol)  21/09/21 (OI)  22/09/21 (LP%)  22/09/21 (Vol)  22/09/21 (OI)
0      A1             120              13              0          135.00           68.60              0             130               0             75
1      A2               0               1              0          143.43           63.81              0               0               0             80
2      A3               0              31              0          143.43           58.10              0               0               0             85

Using matplotlib and the following dataframe in pandas, is there a way to plot:

Something like this:

enter image description here

Upvotes: 1

Views: 445

Answers (3)

Trenton McKinney
Trenton McKinney

Reputation: 62513

  • The issue with the current implementation is you only get a result for 'A1' and then presumably you have repeat the code to get the other data.
  • pandas.wide_to_long will allow for all the columns with a stub following the date, to be transformed.
    • This requires that the stub be at the front, which can be adjusted with a list comprehension on the column names.
    • suffix='\d+\/\d+\/\d+' matches the date format to be extracted.
  • Once the dataframe is transformed, it's easier to use and access all of the data.
  • Use pandas.DataFrame.pivot to reshape the data to plot '(LP%)'
  • Plot directly with pandas.DataFrame.plot
    • The line plot x-axis data format is %M-%D %H because there is only a small amount data. The format will update when there are more days of data.
    • The index will be the x-axis, so it's not necessary to specify x=.
    • Use y= to specify column names, or don't use it and all the columns will be plotted.
  • Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
import pandas as pd

# sample data
data = {'RowName': ['A1', 'A2', 'A3'], '20/09/21 (LP%)': [120, 0, 0], '20/09/21 (Vol)': [13, 1, 31], '20/09/21 (OI)': [0, 0, 0], '21/09/21 (LP%)': [135.0, 143.43, 143.43], '21/09/21 (Vol)': [68.6, 63.81, 58.1], '21/09/21 (OI)': [0, 0, 0], '22/09/21 (LP%)': [130, 0, 0], '22/09/21 (Vol)': [0, 0, 0], '22/09/21 (OI)': [75, 80, 85]}
df = pd.DataFrame(data)

# adjust the column names so the stub is before the date
df.columns = [' '.join(col.split(' ')[::-1]) if len(col.split(' ')) > 1 else col for col in df.columns]

# convert to long form
df = pd.wide_to_long(df, stubnames=['(LP%)', '(Vol)', '(OI)'], i=['RowName'], sep=' ', j='Date', suffix='\d+\/\d+\/\d+').reset_index()

# convert the Date column to a datetime dtype
df.Date = pd.to_datetime(df.Date).dt.date

# display(df.head())
  RowName        Date   (LP%)  (Vol)  (OI)
0      A1  2021-09-20  120.00  13.00     0
1      A2  2021-09-20    0.00   1.00     0
2      A3  2021-09-20    0.00  31.00     0
3      A1  2021-09-21  135.00  68.60     0
4      A2  2021-09-21  143.43  63.81     0

# pivot the LP% column
dlp = df.pivot(index='Date', columns='RowName', values='(LP%)')

# display(dlp)
RowName        A1      A2      A3
Date                             
2021-09-20  120.0    0.00    0.00
2021-09-21  135.0  143.43  143.43
2021-09-22  130.0    0.00    0.00

# plot any specific column
dlp.plot(y=['A1'], rot=0, figsize=(10, 5), marker='o', title='LP%: A1')

enter image description here

# or all 3
dlp.plot(rot=0, figsize=(10, 5), marker='o', title='LP%')
  • The 'A3' line covers the 'A2' line because the data is the same

enter image description here

# specify the kind parameter for a bar plot
dlp.plot(kind='bar', rot=0, figsize=(10, 5), title='LP%')

enter image description here

  • Add one more step and use pandas.DataFrame.melt to reshape the df into a completely long form to easily plot all of the data with seaborn.relplot.
    • seaborn is a high-level API for matplotlib.
import seaborn as sns

# melt
dfm = df.melt(id_vars=['RowName', 'Date'])

# display(dfm.head())
  RowName        Date variable   value
0      A1  2021-09-20    (LP%)  120.00
1      A2  2021-09-20    (LP%)    0.00
2      A3  2021-09-20    (LP%)    0.00
3      A1  2021-09-21    (LP%)  135.00
4      A2  2021-09-21    (LP%)  143.43

# plot
sns.relplot(kind='line', data=dfm, x='Date', y='value', marker='o', row='variable', hue='RowName', height=3.5, aspect=2.25)

enter image description here

Upvotes: 2

alexandra
alexandra

Reputation: 1192

Looking at your dataframe (assume it is named df), I think the quickest way to do this is to transpose it, seeing as you want to use rows as columns:

df_trans = df.transpose() #or df.T
df_trans.plot.line(x='RowName', y='A1', color='red')
import matplotlib.pyplot as plt
plt.show() 

Maybe you want to rename the 'Rowname' column:

df_trans.rename({0: 'LP'}, axis=1)
df_trans.plot.line(x='LP', y='A1', color='red')
import matplotlib.pyplot as plt
plt.show()

Upvotes: 1

Zephyr
Zephyr

Reputation: 12506

You need to reshape your dataframe firstly. Starting from a dataframe like this:

  RowName  20/09/21 (LP%)  20/09/21 (Vol)  20/09/21 (OI)  21/09/21 (LP%)  21/09/21 (Vol)  21/09/21 (OI)  22/09/21 (LP%)  22/09/21 (Vol)  22/09/21 (OI)
0      A1             120              13              0          135.00           68.60              0             130               0             75
1      A2               0               1              0          143.43           63.81              0               0               0             80
2      A3               0              31              0          143.43           58.10              0               0               0             85

You can re-shape with:

# row filter
df = df.iloc[:1, :]

# column filter and transpose
df = df[[col for col in df.columns if '(LP%)' in col]].T

# convert column name to datetime value
df.index = pd.to_datetime(df.index.map(lambda x: x.split(' ')[0]))

# pass a sting as label
df.columns = ['value']

Then you can simply plot with:

df.plot()

plt.show()

Complete Code

import pandas as pd
import matplotlib.pyplot as plt


df = pd.read_csv(r'data/data.csv')


df = df.iloc[:1, :]
df = df[[col for col in df.columns if '(LP%)' in col]].T
df.index = pd.to_datetime(df.index.map(lambda x: x.split(' ')[0]))
df.columns = ['value']

df.plot()

plt.show()

enter image description here


You can further customize datetime axis with:

import matplotlib.dates as md

fig, ax = plt.subplots()

ax.plot(df.index, df['value'])

ax.set_xlabel('Date')
ax.set_ylabel('LP Value')

ax.xaxis.set_major_locator(md.DayLocator(interval = 1))
ax.xaxis.set_major_formatter(md.DateFormatter('%Y-%m-%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 0)
ax.set_xlim([df.index[0], df.index[-1]])

plt.show()

enter image description here

Upvotes: 1

Related Questions