Reputation: 185

How to transform a wide dataframe to plot specific data

I have a dataframe as below:

import pandas as pd

# sample data
# sample data
data = {'RowName': ['A1', 'A2', 'A3'], '20/09/21 (LP%)': [120, 0, 0], '20/09/21 (Vol)': [13, 1, 31], '20/09/21 (OI)': [0, 0, 0], '21/09/21 (LP%)': [135.0, 143.43, 143.43], '21/09/21 (Vol)': [68.6, 63.81, 58.1], '21/09/21 (OI)': [0, 0, 0], '22/09/21 (LP%)': [130, 0, 0], '22/09/21 (Vol)': [0, 0, 0], '22/09/21 (OI)': [75, 80, 85]}
df = pd.DataFrame(data)

# display(df)
  RowName  20/09/21 (LP%)  20/09/21 (Vol)  20/09/21 (OI)  21/09/21 (LP%)  21/09/21 (Vol)  21/09/21 (OI)  22/09/21 (LP%)  22/09/21 (Vol)  22/09/21 (OI)
0      A1             120              13              0          135.00           68.60              0             130               0             75
1      A2               0               1              0          143.43           63.81              0               0               0             80
2      A3               0              31              0          143.43           58.10              0               0               0             85

Using matplotlib and the following dataframe in pandas, is there a way to plot:

x axis : columns that have (LP%) in them - take the date and use each of these columns as a point on the x-axis

I have selected columns using this:
```
df2 = df.filter(regex='LP%')
```
y axis : the actual value for row A1 only

Something like this:

Upvotes: 1

Answers (3)

Trenton McKinney

Reputation: 62513

The issue with the current implementation is you only get a result for 'A1' and then presumably you have repeat the code to get the other data.
pandas.wide_to_long will allow for all the columns with a stub following the date, to be transformed.
- This requires that the stub be at the front, which can be adjusted with a list comprehension on the column names.
- suffix='\d+\/\d+\/\d+' matches the date format to be extracted.
Once the dataframe is transformed, it's easier to use and access all of the data.
Use pandas.DataFrame.pivot to reshape the data to plot '(LP%)'
Plot directly with pandas.DataFrame.plot
- The line plot x-axis data format is %M-%D %H because there is only a small amount data. The format will update when there are more days of data.
- The index will be the x-axis, so it's not necessary to specify x=.
- Use y= to specify column names, or don't use it and all the columns will be plotted.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2

import pandas as pd

# sample data
data = {'RowName': ['A1', 'A2', 'A3'], '20/09/21 (LP%)': [120, 0, 0], '20/09/21 (Vol)': [13, 1, 31], '20/09/21 (OI)': [0, 0, 0], '21/09/21 (LP%)': [135.0, 143.43, 143.43], '21/09/21 (Vol)': [68.6, 63.81, 58.1], '21/09/21 (OI)': [0, 0, 0], '22/09/21 (LP%)': [130, 0, 0], '22/09/21 (Vol)': [0, 0, 0], '22/09/21 (OI)': [75, 80, 85]}
df = pd.DataFrame(data)

# adjust the column names so the stub is before the date
df.columns = [' '.join(col.split(' ')[::-1]) if len(col.split(' ')) > 1 else col for col in df.columns]

# convert to long form
df = pd.wide_to_long(df, stubnames=['(LP%)', '(Vol)', '(OI)'], i=['RowName'], sep=' ', j='Date', suffix='\d+\/\d+\/\d+').reset_index()

# convert the Date column to a datetime dtype
df.Date = pd.to_datetime(df.Date).dt.date

# display(df.head())
  RowName        Date   (LP%)  (Vol)  (OI)
0      A1  2021-09-20  120.00  13.00     0
1      A2  2021-09-20    0.00   1.00     0
2      A3  2021-09-20    0.00  31.00     0
3      A1  2021-09-21  135.00  68.60     0
4      A2  2021-09-21  143.43  63.81     0

# pivot the LP% column
dlp = df.pivot(index='Date', columns='RowName', values='(LP%)')

# display(dlp)
RowName        A1      A2      A3
Date                             
2021-09-20  120.0    0.00    0.00
2021-09-21  135.0  143.43  143.43
2021-09-22  130.0    0.00    0.00

# plot any specific column
dlp.plot(y=['A1'], rot=0, figsize=(10, 5), marker='o', title='LP%: A1')

# or all 3
dlp.plot(rot=0, figsize=(10, 5), marker='o', title='LP%')

The 'A3' line covers the 'A2' line because the data is the same

# specify the kind parameter for a bar plot
dlp.plot(kind='bar', rot=0, figsize=(10, 5), title='LP%')

Add one more step and use pandas.DataFrame.melt to reshape the df into a completely long form to easily plot all of the data with seaborn.relplot.
- seaborn is a high-level API for matplotlib.

import seaborn as sns

# melt
dfm = df.melt(id_vars=['RowName', 'Date'])

# display(dfm.head())
  RowName        Date variable   value
0      A1  2021-09-20    (LP%)  120.00
1      A2  2021-09-20    (LP%)    0.00
2      A3  2021-09-20    (LP%)    0.00
3      A1  2021-09-21    (LP%)  135.00
4      A2  2021-09-21    (LP%)  143.43

# plot
sns.relplot(kind='line', data=dfm, x='Date', y='value', marker='o', row='variable', hue='RowName', height=3.5, aspect=2.25)

Upvotes: 2

alexandra

Reputation: 1192

Looking at your dataframe (assume it is named df), I think the quickest way to do this is to transpose it, seeing as you want to use rows as columns:

df_trans = df.transpose() #or df.T
df_trans.plot.line(x='RowName', y='A1', color='red')
import matplotlib.pyplot as plt
plt.show()

Maybe you want to rename the 'Rowname' column:

df_trans.rename({0: 'LP'}, axis=1)
df_trans.plot.line(x='LP', y='A1', color='red')
import matplotlib.pyplot as plt
plt.show()

Upvotes: 1

Zephyr

Reputation: 12506

You need to reshape your dataframe firstly. Starting from a dataframe like this:

  RowName  20/09/21 (LP%)  20/09/21 (Vol)  20/09/21 (OI)  21/09/21 (LP%)  21/09/21 (Vol)  21/09/21 (OI)  22/09/21 (LP%)  22/09/21 (Vol)  22/09/21 (OI)
0      A1             120              13              0          135.00           68.60              0             130               0             75
1      A2               0               1              0          143.43           63.81              0               0               0             80
2      A3               0              31              0          143.43           58.10              0               0               0             85

You can re-shape with:

# row filter
df = df.iloc[:1, :]

# column filter and transpose
df = df[[col for col in df.columns if '(LP%)' in col]].T

# convert column name to datetime value
df.index = pd.to_datetime(df.index.map(lambda x: x.split(' ')[0]))

# pass a sting as label
df.columns = ['value']

Then you can simply plot with:

df.plot()

plt.show()

Complete Code

import pandas as pd
import matplotlib.pyplot as plt


df = pd.read_csv(r'data/data.csv')


df = df.iloc[:1, :]
df = df[[col for col in df.columns if '(LP%)' in col]].T
df.index = pd.to_datetime(df.index.map(lambda x: x.split(' ')[0]))
df.columns = ['value']

df.plot()

plt.show()

You can further customize datetime axis with:

import matplotlib.dates as md

fig, ax = plt.subplots()

ax.plot(df.index, df['value'])

ax.set_xlabel('Date')
ax.set_ylabel('LP Value')

ax.xaxis.set_major_locator(md.DayLocator(interval = 1))
ax.xaxis.set_major_formatter(md.DateFormatter('%Y-%m-%d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 0)
ax.set_xlim([df.index[0], df.index[-1]])

plt.show()

Upvotes: 1

How to transform a wide dataframe to plot specific data

Answers (3)

Complete Code

Related Questions