Reputation: 1043
I'm getting surprising behavior trying to convert a microsecond string date to an integer:
n = 20181231235959383171
int_ = np.int(n) # Works
int64_ = np.int64(n) # "OverflowError: int too big to convert"
Any idea why?
Edit - Thank you all, this is informative, however please see my actual problem: Dataframe column won't convert from integer string to an actual integer
Upvotes: 0
Views: 1190
Reputation: 198476
When used as dtype
, np.int
is equivalent to np.int_
(architecture-dependent size), which is probably np.int64
. So np.array([n], dtype=np.int)
will fail. Outside dtype
, np.int
behaves as Python int
. Numpy is basically helping you calculate as much stuff in C-land as possible in order to speed up the calculations and conserve memory; but (AFAIK) integers larger than 64 bits do not exist in standard C (though the new GCC does support them on some architectures). So you are stuck using either Python integers, slow but of unlimited size, or C integers, fast but not big enough for this.
There are two obvious ways to stuff a large integer into a numpy array:
You can use the Python type, signified by dtype=object
: np.array([n], dtype=object)
will work, but you are getting no speedup or memory benefits from numpy.
You can split the microsecond time into second time (n // 1000000
) and second fractions (n % 1000000
), as two separate columns.
Upvotes: 3
Reputation: 9575
An np.int
can be arbitrarily large, like a python integer.
An np.int64
can only range from -263 to 263 - 1. Your number happens to fall outside this range.
Upvotes: 4