le_crease
le_crease

Reputation: 69

pd.read_csv - Disregard first N rows

I'm trying to do some AWS pricing analysis using Pandas, and it involves bringing EC2 pricing data into a df using their API. Unfortunately, the dataset is headed by 5 rows and 2 columns of descriptors before the useful data starts (see image). This causes an error when my code encounters the start of the useful data, which has 51 columns.

How can I tell it to ignore the first 5 rows, and to treat the 6th row as my column headers?

Here's where I'm at:

import pandas as pd
import requests
import io

pricing_url = "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/index.csv"
pricing_r = requests.get(pricing_url).content
pricing = pd.read_csv(io.StringIO(pricing_r.decode('utf-8')))

ParserError: Error tokenizing data. C error: Expected 2 fields in line 6, saw 51

Upvotes: 2

Views: 399

Answers (1)

jack6e
jack6e

Reputation: 1522

As always, the docs are useful here. Simply specify the row you want to use as the header row, and the start of your data:

pricing = pd.read_csv(io.StringIO(pricing_r.decode('utf-8')), header=5)

Upvotes: 3

Related Questions