Reputation: 6950
My code is the following:
import pandas as pd
import numpy as np
df = pd.read_csv("path/to/my/infile.csv")
df = df.sort_values(['distance', 'time'])
df.to_csv("path/to/my/outfile.csv")
This code reads from infile.csv which is a 3GB csv file successfully, sorts it and fails when trying to write to outfile.csv with the following error:
OSError Traceback (most recent call last)
<ipython-input-10-3a5c8279658d> in <module>
----> 1 df.to_csv('/Users/joaomatos/Desktop/cluster22_sorted_training.csv',index=False)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1743 doublequote=doublequote,
1744 escapechar=escapechar, decimal=decimal)
-> 1745 formatter.save()
1746
1747 if path_or_buf is None:
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
164 encoding=encoding,
165 compression=self.compression)
--> 166 f.write(buf)
167 f.close()
168 for _fh in handles:
OSError: [Errno 22] Invalid argument
My question is why?
Upvotes: 14
Views: 39729
Reputation: 3598
import pandas as pd
print(pd.__file__) ----read directory of pands
loan_data=pd.read_csv("D:\Lib\site-packages\pandas\email-password.csv")
loan_data
Upvotes: 0
Reputation: 1
Just remove any file in "path/to/my/" or try:
if not files_present:
pd.to_csv(filename)
else:
print 'WARNING: This file already exists!'
Upvotes: 0
Reputation: 78
One of the answer mentions to substitute \
with /
. This works with me as well, but it can be tedious. An alternative is to put an r
in front of the string (like this: r"my\string\"
), which tells the interpreter to treat the backslashes (\
) as raw characters, instead of combining them with the next char to form a special character (e.g., \n
, \t
).
Upvotes: 1
Reputation: 111
I resolved this by removing unusual characters contained in the timestamp that made up part of the filename.
Not working:
file_timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
df.to_csv(file_name + '_' + file_timestamp + '.csv')
Working:
file_timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
df.to_csv(file_name + '_' + file_timestamp + '.csv')
Upvotes: 1
Reputation: 159
I just had a similar issue and I was using back slash \
which usually works in the past but this time turn out I had to use /
instead, which is extremely weird but it worked.
Upvotes: 15
Reputation: 370
In my case (working on an external hard drive) it worked once I specified the absolute, rather than the relative, path.
Upvotes: 1
Reputation: 311
After exploring a lot of options, including the pandas library update to the latest version (1.2.4 as of today), changing the engine to "python" or "c", debugging, etc. I finally discovered what the issue was:
I had my CSV files stored in a folder that was constantly being synchronized in real-time with OneDrive.
YES! I discovered that the tray icon was becoming crazy and OneDrive was consuming resources at the same time I was doing algorithmic trading backtesting to my pet project. I paused sync and then it never failed again!!
I guess you can also exclude the folder from OneDrive or simply change the location where the CSVs are stored/written/accessed.
Upvotes: 21
Reputation: 6950
Apparently this problem is caused by a known bug reported here associated with a previous version of pandas. All I had to do was pip3 install --upgrade pandas
and then restart the computer.
Upvotes: 4