Reputation: 197
I have an odd situation in which pandas (assuming pandas, not Python) gives an inconsistent error. I am running Python 2.7.11 with Pandas 0.17.1 on a Windows 10 machine.
The basic error is this: if I have two df's with matched indexes and then simply do: dfA / dfB - 1, this calculation returns inconsistent errors if it is re-run many times.
Specifically:
import pandas as pd
close = pd.read_csv("C:\close.csv")
shifted = pd.read_csv("C:\shifted.csv")
ret = pd.DataFrame()
ret = shifted.C / close.C - 1
foo = min(ret)
bar = max(ret)
print "Starting with Max: %.4f Min %.4f" % (foo, bar)
for i in range(1000):
ret = shifted.C / close.C - 1
foo = min(ret)
bar = max(ret)
if foo < -.17 or bar > .16:
print "Error on run %i: Max: %.4f Min %.4f" % (i, foo, bar)
I have .py and two csv's at this link.
Put the csv's in your C: root (or change code for file location elsewhere) and run the code. If it doesn't error, run it again and it likely will. Even the error frequency is inconsistent; sometimes it will error 20+ times in a thousand iterations, but usually only 1-2.
This seems like pretty basic functionality so I must be doing something wrong. This came out of a much larger project where I assumed it was do to Nan's being handled inconsistently, but this example shows that is not the case.
Any help would be appreciated. Thank you!
Update: per @EdChum's implied suggestion, updated Python to Python 3.5.1 |Anaconda 2.4.1 (64-bit)| (default, Dec 7 2015, 15:00:12) [MSC v.1900 64 bit (AMD64)] on win32.
Pandas version is 0.17.1 and Numpy is 1.10.1.
Lest you think I'm crazy (I probably would if someone came to me with this error), here are the results from a few runs of the little program. Errors seem to be more rare, but they still happen. Errors on Windows 10 machine
Any thoughts? A memory issue of some type? What could cause an intermittent error in such a simple operation?
Update #2 Thinking this might be some kind of memory issue, so rewrote the code to simply count the number of errors in the operation. Got these highly suspicious results:
>85 errors in 20000 runs on 10100 dataframe rows
>144 errors in 20000 runs on 10001 dataframe rows
>0 errors in 20000 runs on 10000 dataframe rows
>0 errors in 20000 runs on 9999 dataframe rows
10,000 rows is not a lot, but it appears this is the issue? Is there some limitation in Pandas that I should be aware of?
Upvotes: 2
Views: 705
Reputation: 1081
This error is caused by NumExpr, version 2.4.4. We (Continuum) will be updating this package soon, which has been confirmed to fix this issue. Until then, you can remove numexpr:
conda remove numexpr
See this related issue: https://github.com/pydata/pandas/issues/11743
EDIT: NumExpr 2.4.6 should now (01/14/16) be available.
Upvotes: 2
Reputation: 197
Looks like the problem is with the Anaconda installation. Pure Python installation solves the problem on Windows. Scary bug. Everyone who looked at it--thank you for your help!
Upvotes: 0