Reputation: 2044
I have a data file (csv) with Nilsimsa
hash values. Some of them would have as long as 80 characters. I wish to read them in Python for data analysis tasks. Is there a way to import the data in python without information loss?
EDIT: I have tried the implementations proposed in the comments but that does not work for me.
Example data in csv file would be: 77241756221441762028881402092817125017724447303212139981668021711613168152184106
Upvotes: 3
Views: 2125
Reputation: 2044
As explained by @JohnE in his answer that we do not lose any information while reading big numbers using Pandas. They are stored as dtype=object
, to make numerical computation on them we need to transform this data into numerical type.
For series:
We have to apply the map(func)
to the series in the dataframe:
df['columnName'].map(int)
Whole dataframe:
If for some reason, our entire dataframe is composed of columns with dtype=object
, we look at applymap(func)
from the documentation of Pandas:
DataFrame.applymap(func): Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
so to transform all columns in dataframe:
df.applymap(int)
Upvotes: 1
Reputation: 30444
Start with a simple text file to read in, just one variable and one row.
%more foo.txt
x
77241756221441762028881402092817125017724447303212139981668021711613168152184106
In [268]: df=pd.read_csv('foo.txt')
Pandas will read it in as a string because it's too big to store as a core number type like int64 or float64. But the info is there, you didn't lose anything.
In [269]: df.x
Out[269]:
0 7724175622144176202888140209281712501772444730...
Name: x, dtype: object
In [270]: type(df.x[0])
Out[270]: str
And you can use plain python to treat it as a number. Recall the caveats from the links in the comments, this isn't going to be as fast as stuff in numpy and pandas where you have stored a whole column as int64. This is using the more flexible but slower object mode to handle things.
You can change a column to be stored as longs (long integers) like this. (But note that the dtype is still object because everything except the core numpy types (int32, int64, float64, etc.) are stored as objects.)
In [271]: df.x = df.x.map(int)
And then can more or less treat it like a number.
In [272]: df.x * 2
Out[272]:
0 1544835124428835240577628041856342500354488946...
Name: x, dtype: object
You'll have to do some formatting to see the whole number. Or go the numpy route which will default to showing the whole number.
In [273]: df.x.values * 2
Out[273]: array([ 154483512442883524057762804185634250035448894606424279963336043423226336304368212L], dtype=object)
Upvotes: 1