Reputation: 2452
My question is how to close HDF5 files indefinitely after writing them?
I am trying to save data to HDF5 files - there are around 200 folders and each folder contains some data for each day for this year.
When I retrieve and save data using pandas HDFStore
with following code in iPython console, the function stop automatically after a while (no error msg).
import pandas as pd
data = ... # in format as pd.DataFrame
# Method 1
data.to_hdf('D:/file_001/2016-01-01.h5', 'type_1')
# Method 2
with pd.HDFStore('D:/file_001/2016-01-01.h5', 'a') as hf:
hf['type_1'] = data
When I tried the same script to download data again, it says:
[Errno 24] Too many open files: ...
There are some posts suggesting using ulimit -n 1200 for example in Linux to overcome the problem, but unfortunately I'm using Windows.
Besides, I think I already close files explicitly using with closure, especially in Method 2. How come iPython still count these files as open?
My loop is sth like below:
univ = pd.read_excel(univ_file, univ_tab)
for dt in pd.DatetimeIndex(start=start_date, end=end_date, freq='B'):
for t in univ:
data = download_data(t, dt)
with pd.HDFStore(data_file, 'a') as hf:
# Use pd.DataFrame([np.nan]) instead of pd.DataFrame() to save space
hf[typ] = EMPTY_DF if data.shape[0] == 0 else data
Upvotes: 3
Views: 1087
Reputation: 210842
You can check / list all open files belonging to Python process in Windows using psutil
module.
Demo:
In [52]: [proc.open_files() for proc in psutil.process_iter() if proc.pid == os.getpid()]
Out[52]:
[[popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1),
popenfile(path='C:\\Users\\Max\\.ipython\\profile_default\\history.sqlite-journal', fd=-1),
popenfile(path='C:\\Users\\Max\\.ipython\\profile_default\\history.sqlite', fd=-1)]]
a file handler will be closed as soon as we are done with the following block:
In [53]: with pd.HDFStore('d:/temp/1.h5', 'a') as hf:
....: hf['df2'] = df
....:
prove:
In [54]: [proc.open_files() for proc in psutil.process_iter() if proc.pid == os.getpid()]
Out[54]:
[[popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1),
popenfile(path='C:\\Users\\Max\\.ipython\\profile_default\\history.sqlite', fd=-1)]]
check whether psutil
works properly at all (pay attention at the D:\\temp\\aaa
):
In [55]: fd = open('d:/temp/aaa', 'w')
In [56]: [proc.open_files() for proc in psutil.process_iter() if proc.pid == os.getpid()]
Out[56]:
[[popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1),
popenfile(path='D:\\temp\\aaa', fd=-1),
popenfile(path='C:\\Users\\Max\\.ipython\\profile_default\\history.sqlite', fd=-1)]]
In [57]: fd.close()
In [58]: [proc.open_files() for proc in psutil.process_iter() if proc.pid == os.getpid()]
Out[58]:
[[popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1),
popenfile(path='C:\\Users\\Max\\.ipython\\profile_default\\history.sqlite', fd=-1)]]
So using this technique you can debug your code and find the place where the number of open files goes crazy in your case
Upvotes: 1