Reputation: 164
I am going through the Pandas "cook book" chapter 1, bikes.csv example. When I try to change parse_dates to ['Date'], dayfirst=True, index_col to date like this: (at line: In [6], in the cook book's 1st chapter)
fixed_df = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', parse_dates=['Date'], dayfirst=True, index_col='Date')
I get this: ValueError: 'Date' is not in list. Before I write here, I try these solutions:
1st: utf-8 bom problem
As I understand, bom with in the utf-8 creates some problem and cause this error. In addition "Dates" line is accepted as a tuple
by pandas while reading? (so sorry if I write it with wrong words, but this what I remember and I am not pro at Python) I try to convert encoding with this suggestion:
the "utf-8-sig" codec gives a unicode string without the BOM:
fp = open("file.txt")
s = fp.read()
u = s.decode("utf-8-sig")
Even I did not get any error, it did not work.
2nd: Vim I try these to change encoding
iconv -f UTF-8 -t ISO-8859-1 infile.txt > outfile.txt
and this,
vim +"set nobomb | set fenc=utf8 | x" filename.txt
None of them works.
3rd: I try to change the file encoding when I open it with vim.
set fileencoding=utf-8-sig
(and other possible codings like ANSI, ASCII etc.)
I get this error
E213: Cannot convert (add ! to write without conversion)
Would you please help me, where do I miss? Many thanks in advance
Upvotes: 4
Views: 9890
Reputation:
With the URL you provided
url = 'http://donnees.ville.montreal.qc.ca/dataset/f170fecc-18db-44bc-b4fe-5b0b6d2c7297/resource/d54cec49-349e-47af-b152-7740056d7311/download/comptagevelo2012.csv'
df = pd.read_csv(url, sep=',', parse_dates={'datetime':[0, 1]}, index_col='datetime')
df.head()
gives
Rachel / Papineau Berri1 Maisonneuve_2 Maisonneuve_1 Brébeuf \
datetime
2012-01-01 16 35 51 38 5.0
2012-02-01 43 83 153 68 11.0
2012-03-01 58 135 248 104 2.0
2012-04-01 61 144 318 116 2.0
2012-05-01 95 197 330 124 6.0
I have changed both the sep
and encoding
arguments because the separator in that file is comma and the encoding is utf-8
(the default value for read_csv
). There is an unnamed column for time, you can use that to include in parsing too. In this example I think they are all zero but this might be useful in other cases.
Upvotes: 6