Reputation: 11
I have an integer variable "Sector" and it gets converted to float when introduced into a pandas dataframe, but I want to keep it as integer. Not sure why is happening. I am working in a jupyter notebook.
The code:
sector=0
last_sector=1
for sector in range(last_sector,83):
try:
address = 'Singapore'+', '+str(sector)
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates for {} are {}, {}.'.format(address,latitude, longitude))
sg_sectors = sg_sectors.append({'Sector': sector,
'Latitude': latitude,
'Longitude': longitude}, ignore_index=True)
except:
last_sector=int(sg_sectors['Sector'].max())
print('Coordinates for sectors up to ',last_sector,' have already been gathered')
The output is:
Sector Latitude Longitude
0 1.0 1.339782 103.973006
1 2.0 1.386609 103.851935
2 3.0 1.276690 103.869153
...
How can I keep it as integer?
Upvotes: 1
Views: 61
Reputation: 19905
The reason is this line, which is a pandas
antipattern:
sg_sectors = sg_sectors.append({'Sector': sector,
'Latitude': latitude,
'Longitude': longitude}, ignore_index=True)
You are creating a new DataFrame
every iteration. This probably won't matter in this specific case because your dataset is relatively small, but if you scale up, it will. A lot.
This also has the unfortunate side effect of widening the types used to the narrowest common supertype, which is, in this case, float
. In other words, sector
is originally an int
, but because latitude
and longitude
are floats
, sector
is itself widened to a float
.
If you want to avoid this, instead collect your values in a list
by defining, say, sg_sector_data = []
at the start. Then, in the loop, you can have this:
sector_data = {'Sector': sector, 'Latitude': latitude, 'Longitude': longitude}
sg_sector_data.append(sector_data)
And finally, at the end, create your DataFrame
with sg_sectors = pd.DataFrame(sg_sector_data)
.
Upvotes: 1
Reputation: 449
You can make columns to be a certain data type by applying astype.
df= pd.DataFrame(np.arange(5), columns=['a'])
df.a = df.a.astype(float)
print(df)
a
0 0.0
1 1.0
2 2.0
3 3.0
4 4.0
df = df.astype({'a':int})
print(df)
a
0 0
1 1
2 2
3 3
4 4
You can apply this datatype to all columns as I did in the first, creating a float dataframe, but also limit the effect to certain columns, using a dictionary as I did to make the column an integer again.
Hope, that helps. Best, lepakk
Upvotes: 0