GeoPandas .to_file gives wrong column?

Question

I have a GeoDataFrame with a column of float values and I want to transform them in int values, then overwrite the shapefile.

import numpy as np
import pandas as pd
import geopandas as gpd

gdf=gpd.read_file(r'.\folder\gdf.shp')

This gdf has a column of float values, float_column:

gdf["float_column"]

0      1.307500e+12
1      1.307500e+12
2      1.307500e+12
3      1.307500e+12
4      1.307500e+12
5      1.307500e+12
6      1.307500e+12
7      1.307500e+12
8      1.307500e+12
9      1.307500e+12

Then I apply a transformation:

gdf["int_column"]=[int(x) for x in gdf["float_column"]]

Which has these values (right transformation):

gdf["int_column"]

0      1307500192816
1      1307500170116
2      1307500012418
3      1307500152317
4      1307500141816
5      1307500093417
6      1307500055117
7      1307500081117
8      1307500107717
9      1307500096916
10     1307500213815

Then I save the gdf:

gdf.to_file(r".\folder\gdf.shp",driver='ESRI Shapefile',crs_wkt=prj)

And when I cross-check the result int_column has these values:

gdf_try=gpd.read_file(r'.\folder\gdf.shp')

gdf_try["int_column"]

0      2147483647
1      2147483647
2      2147483647
3      2147483647
4      2147483647
5      2147483647
6      2147483647
7      2147483647
8      2147483647
9      2147483647

Which seems totally crazy! Did I miss something very stupid??

jdmcbr · Accepted Answer

The issue is, as noted in the comments, due to int32 limits. The proper dtype isn't being inferred, leading to the loss of information. This should be resolved with an upcoming release of fiona (which geopandas uses for reading/writing files), which will improve how int types are handled (https://github.com/Toblerity/Fiona/pull/564). In the meantime, you can use

schema = gpd.io.file.infer_schema(gdf)
schema['properties']['int_column'] = 'int:18'
gdf.to_file('gdf.shp', schema=schema)

GeoPandas .to_file gives wrong column?

Answers (1)

Related Questions