I cannot read file this CSV file using pd.read_csv with different number of expected values

Question

I have been trying for a few hours to read this file. I have tried researching solutions and applying them. THey did not work. The file itself opens fine on Excel, but I cannot read it with Pandas.

The response keeps returning the same error: ParserError: Expected 3 fields in line 5, saw 63

I have seen a few other questions on this topic, but none of the solutions to those specific questions has solved my issue.

Does anyone know why I am failing to read this file and how I can fix it? Thank you

IN:

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv',
                 header=None,
                 engine='python',
                error_bad_lines=True)

OUT:

ParserError                               Traceback (most recent call last)
 in ()
      2                  header=None,
      3                  engine='python',
----> 4                 error_bad_lines=True)

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    444 
    445     try:
--> 446         data = parser.read(nrows)
    447     finally:
    448         parser.close()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1034                 raise ValueError('skipfooter not supported for iteration')
   1035 
-> 1036         ret = self._engine.read(nrows)
   1037 
   1038         # May alter columns / col_dict

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, rows)
   2264             content = content[1:]
   2265 
-> 2266         alldata = self._rows_to_cols(content)
   2267         data = self._exclude_implicit_index(alldata)
   2268 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _rows_to_cols(self, content)
   2907                     msg += '. ' + reason
   2908 
-> 2909                 self._alert_malformed(msg, row_num + 1)
   2910 
   2911         # see gh-13320

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _alert_malformed(self, msg, row_num)
   2674 
   2675         if self.error_bad_lines:
-> 2676             raise ParserError(msg)
   2677         elif self.warn_bad_lines:
   2678             base = 'Skipping line {row_num}: '.format(row_num=row_num)

ParserError: Expected 3 fields in line 5, saw 63

Here is a sample of the CSV file:

"Country_Name","Country_Code","Indicator_Name","Indicator_Code","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017",
"Aruba","ABW","CO2 emissions (metric tons per capita)","EN.ATM.CO2E.PC","","","","","","","","","","","","","","","","","","","","","","","","","","","2.86831939212055","7.23519803341258","10.0261792105306","10.6347325992922","26.3745032100275","26.0461298009966","21.4425588041328","22.000786163522","21.0362451108214","20.7719361585578","20.3183533653846","20.4268177083943","20.5876691453648","20.311566765912","26.1948752380219","25.9340244138733","25.6711617820448","26.4204520857169","26.5172934158421","27.200707780588","26.9482604728658","27.8955739972338","26.2308466448946","25.9158329472761","24.6705288731078","24.5058352032767","13.1555416906324","8.35129425218293","8.408362637892","","","",

Niels Henkens · Accepted Answer

Changing your code to

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv', header=None, engine='python', error_bad_lines=False)

will import your csv, but wont correctly import your csv. Probably there is something with your csv and the separator used. Could you post the 5th line of the csv you are trying to import? Does the last column for example contain text with comma's? How many columns do you expect: 3, 63, or something else?

I cannot read file this CSV file using pd.read_csv with different number of expected values

Answers (2)

Related Questions