having trouble to read csv with pandas

Question

i'm new in the world of data mining. I'm trying to calculate the correlation between 16 variables in a dataset of about 500 rows. I have to do this with pandas. But i have a problem also with the reading of a csv file (i'm on mac i don't know if it is the problem)! This is the code I used:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
corr = data.corr()
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(corr,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(data.columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(data.columns)
ax.set_yticklabels(data.columns)
plt.show()

And the error is:

Traceback (most recent call last):
  File "/Users/myname/eclipse2-workspace/Prova/ciao.py", line 4, in 
    data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

I have tried in a lot of ways but i can't do this!

Daneel R. · Accepted Answer

What you are trying to download is not a csv file, but an html page that displays a table with the information extracted from the csv file. Tou have to use the link that is created when you click su Download on the top right, and pass that one to .read_csv(). It should look like this:

url = 'https://UGLYUGLYTHINGS.dl.dropboxusercontent.com/cd/0/get/MOREUGLYTHINGSHERE/file?_download_id=ENCODED_ID_OF_THE_FILE&_notify_domain=www.dropbox.com&dl=1'

The parts of the string above written in uppercase letters correspond to whatever dropbox does backend.
Also, don't forget to give as a sep parameter to .read_csv() the char ';', as follows:

data = pd.read_csv(url,sep=';')

If you use the correct url, the rest of the code works.

Also, as mentioned in the comment above, please change the header/title of your question, because it may mislead someone. The issue lies in reading a remote file, rather than computing the correlation.

having trouble to read csv with pandas

Answers (1)

Related Questions