GlaceCelery
GlaceCelery

Reputation: 1043

np.int64 is a smaller container than np.int....?

I'm getting surprising behavior trying to convert a microsecond string date to an integer:

n = 20181231235959383171
int_ = np.int(n)  # Works
int64_ = np.int64(n)  # "OverflowError: int too big to convert"

Any idea why?

Edit - Thank you all, this is informative, however please see my actual problem: Dataframe column won't convert from integer string to an actual integer

Upvotes: 0

Views: 1190

Answers (2)

Amadan
Amadan

Reputation: 198476

When used as dtype, np.int is equivalent to np.int_ (architecture-dependent size), which is probably np.int64. So np.array([n], dtype=np.int) will fail. Outside dtype, np.int behaves as Python int. Numpy is basically helping you calculate as much stuff in C-land as possible in order to speed up the calculations and conserve memory; but (AFAIK) integers larger than 64 bits do not exist in standard C (though the new GCC does support them on some architectures). So you are stuck using either Python integers, slow but of unlimited size, or C integers, fast but not big enough for this.

There are two obvious ways to stuff a large integer into a numpy array:

  • You can use the Python type, signified by dtype=object: np.array([n], dtype=object) will work, but you are getting no speedup or memory benefits from numpy.

  • You can split the microsecond time into second time (n // 1000000) and second fractions (n % 1000000), as two separate columns.

Upvotes: 3

Alec
Alec

Reputation: 9575

An np.int can be arbitrarily large, like a python integer.

An np.int64 can only range from -263 to 263 - 1. Your number happens to fall outside this range.

Upvotes: 4

Related Questions