Reputation: 17
I have a GeoDataFrame with a column of float values and I want to transform them in int values, then overwrite the shapefile.
import numpy as np
import pandas as pd
import geopandas as gpd
gdf=gpd.read_file(r'.\folder\gdf.shp')
This gdf has a column of float values, float_column:
gdf["float_column"]
0 1.307500e+12
1 1.307500e+12
2 1.307500e+12
3 1.307500e+12
4 1.307500e+12
5 1.307500e+12
6 1.307500e+12
7 1.307500e+12
8 1.307500e+12
9 1.307500e+12
Then I apply a transformation:
gdf["int_column"]=[int(x) for x in gdf["float_column"]]
Which has these values (right transformation):
gdf["int_column"]
0 1307500192816
1 1307500170116
2 1307500012418
3 1307500152317
4 1307500141816
5 1307500093417
6 1307500055117
7 1307500081117
8 1307500107717
9 1307500096916
10 1307500213815
Then I save the gdf:
gdf.to_file(r".\folder\gdf.shp",driver='ESRI Shapefile',crs_wkt=prj)
And when I cross-check the result int_column has these values:
gdf_try=gpd.read_file(r'.\folder\gdf.shp')
gdf_try["int_column"]
0 2147483647
1 2147483647
2 2147483647
3 2147483647
4 2147483647
5 2147483647
6 2147483647
7 2147483647
8 2147483647
9 2147483647
Which seems totally crazy! Did I miss something very stupid??
Upvotes: 0
Views: 1142
Reputation: 6426
The issue is, as noted in the comments, due to int32
limits. The proper dtype isn't being inferred, leading to the loss of information. This should be resolved with an upcoming release of fiona
(which geopandas
uses for reading/writing files), which will improve how int
types are handled (https://github.com/Toblerity/Fiona/pull/564). In the meantime, you can use
schema = gpd.io.file.infer_schema(gdf)
schema['properties']['int_column'] = 'int:18'
gdf.to_file('gdf.shp', schema=schema)
Upvotes: 2