prashanth manohar
prashanth manohar

Reputation: 680

Plotting graphs with Altair from a Pandas Dataframe

I am trying to read table values from a spreadsheet and plot different charts using Altair. The spreadsheet can be found here

import pandas as pd
xls_file = pd.ExcelFile('PET_PRI_SPT_S1_D.xls')
xls_file


crude_df = xls_file.parse('Data 1')
crude_df

I am setting the second row values as column headers of the data frame.

crude_df.columns = crude_df.iloc[1]
crude_df.columns

Index(['Date', 'Cushing, OK WTI Spot Price FOB (Dollars per Barrel)',
       'Europe Brent Spot Price FOB (Dollars per Barrel)'],
      dtype='object', name=1)

The following is a modified version of Altair code got from documentation examples

crude_df_header = crude_df.head(100)

import altair as alt
alt.Chart(crude_df_header).mark_circle().encode(
    # Mapping the WTI column to y-axis
    y='Cushing, OK WTI Spot Price FOB (Dollars per Barrel)'
)

This does not work.

Error is shown as

TypeError: Object of type datetime is not JSON serializable

How to make 2 D plots with this data?

Also, how to make plots for number of values exceeding 5000 in Altair? Even this results in errors.

Upvotes: 0

Views: 1254

Answers (1)

LazyClown
LazyClown

Reputation: 822

Your error is due to the way you parsed the file. You have set the column name but forgot to remove the first two rows, including the ones which are now the column names. The presence of these string values resulted in the error.

The proper way of achieving what you are looking for will be as follow:

import pandas as pd
import altair as alt

crude_df = pd.read_excel(open('PET_PRI_SPT_S1_D.xls', 'rb'),
              sheet_name='Data 1',index_col=None, header=2)  

alt.Chart(crude_df.head(100)).mark_circle().encode(
    x ='Date',
    y='Cushing, OK WTI Spot Price FOB (Dollars per Barrel)'
)

enter image description here

For the max rows issue, you can use the following

alt.data_transformers.disable_max_rows()

But be mindful of the official warning

If you choose this route, please be careful: if you are making multiple plots with the dataset in a particular notebook, the notebook will grow very large and performance may suffer.

Upvotes: 2

Related Questions