Reputation: 183
i'm new in the world of data mining. I'm trying to calculate the correlation between 16 variables in a dataset of about 500 rows. I have to do this with pandas. But i have a problem also with the reading of a csv file (i'm on mac i don't know if it is the problem)! This is the code I used:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
corr = data.corr()
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(corr,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(data.columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(data.columns)
ax.set_yticklabels(data.columns)
plt.show()
And the error is:
Traceback (most recent call last):
File "/Users/myname/eclipse2-workspace/Prova/ciao.py", line 4, in <module>
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
I have tried in a lot of ways but i can't do this!
Upvotes: 0
Views: 1059
Reputation: 547
What you are trying to download is not a csv file, but an html page that displays a table with the information extracted from the csv file. Tou have to use the link that is created when you click su Download on the top right, and pass that one to .read_csv(). It should look like this:
url = 'https://UGLYUGLYTHINGS.dl.dropboxusercontent.com/cd/0/get/MOREUGLYTHINGSHERE/file?_download_id=ENCODED_ID_OF_THE_FILE&_notify_domain=www.dropbox.com&dl=1'
The parts of the string above written in uppercase letters correspond to whatever dropbox does backend.
Also, don't forget to give as a sep
parameter to .read_csv() the char ';'
, as follows:
data = pd.read_csv(url,sep=';')
If you use the correct url, the rest of the code works.
Also, as mentioned in the comment above, please change the header/title of your question, because it may mislead someone. The issue lies in reading a remote file, rather than computing the correlation.
Upvotes: 2