Reputation: 2971
I'm read_excel()
to convert an xlsx spreadsheet into a dataframe. It usually works fine, but for this spreadsheet when I import it and try to list all the values with df["Date"]
it seems to repeat every date many, many times. Any idea why this is?
df = pd.read_excel("data.xlsx", sheet_name="Data")
dates = df['Date']
The result:
0 2014-12-22
1 2014-12-22
2 2014-12-22
3 2014-12-22
4 2014-12-22
...
419532 2020-01-10
419533 2020-01-10
419534 2020-01-10
419535 2020-01-10
419536 2020-01-10
edit: A bit of the spreadsheet:
As you can see the dates are repeated, they definitely do not look like that in my spreadsheet. Any idea why? Thanks
Upvotes: 1
Views: 388
Reputation: 4564
pd.read_excel()
will read the entire sheet unless you constrain it to a limited range. Excel has a number of functions to alter the display of the sheet without changing what data exists in the sheet (filtering, hiding rows, etc.). So the extra rows are likey data that exists in the sheet but have been hidden from display. abs
Some tell tale signs are if a filter button has the filtered symbol in it, blue row numbers, and hidden rows as seen below.
Check to make sure that you can see all the data so you know what is in the worksheet and not what is just visible.
To get to the same view that you have, you can filter your dataframe with conditions.
df = df[df['city'] == 'CALGARY']
or multiple conditions:
df = df[(df['city'] == 'CALGARY') & (df['utility'] == 'Suncor')]
Upvotes: 1