Xavier Martínez
Xavier Martínez

Reputation: 11

An integer variable gets converted to float when introducing it into a dataframe. How keep it as integer?

I have an integer variable "Sector" and it gets converted to float when introduced into a pandas dataframe, but I want to keep it as integer. Not sure why is happening. I am working in a jupyter notebook.

The code:

sector=0
last_sector=1
for sector in range(last_sector,83):
    try:
        address = 'Singapore'+', '+str(sector)
        geolocator = Nominatim(user_agent="to_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        print('The geographical coordinates for {} are {}, {}.'.format(address,latitude, longitude))
        sg_sectors = sg_sectors.append({'Sector': sector,
                                        'Latitude': latitude,
                                        'Longitude': longitude}, ignore_index=True)
    except:
        last_sector=int(sg_sectors['Sector'].max())
        print('Coordinates for sectors up to ',last_sector,' have already been gathered')

The output is:

 Sector   Latitude  Longitude 

0 1.0 1.339782 103.973006
1 2.0 1.386609 103.851935
2 3.0 1.276690 103.869153
...

image of the output

How can I keep it as integer?

Upvotes: 1

Views: 61

Answers (2)

gmds
gmds

Reputation: 19905

The reason is this line, which is a pandas antipattern:

sg_sectors = sg_sectors.append({'Sector': sector,
                                'Latitude': latitude,
                                'Longitude': longitude}, ignore_index=True)

You are creating a new DataFrame every iteration. This probably won't matter in this specific case because your dataset is relatively small, but if you scale up, it will. A lot.

This also has the unfortunate side effect of widening the types used to the narrowest common supertype, which is, in this case, float. In other words, sector is originally an int, but because latitude and longitude are floats, sector is itself widened to a float.

If you want to avoid this, instead collect your values in a list by defining, say, sg_sector_data = [] at the start. Then, in the loop, you can have this:

sector_data = {'Sector': sector, 'Latitude': latitude, 'Longitude': longitude}

sg_sector_data.append(sector_data)

And finally, at the end, create your DataFrame with sg_sectors = pd.DataFrame(sg_sector_data).

Upvotes: 1

Lepakk
Lepakk

Reputation: 449

You can make columns to be a certain data type by applying astype.

df= pd.DataFrame(np.arange(5), columns=['a'])
df.a = df.a.astype(float)
print(df)
     a
0  0.0
1  1.0
2  2.0
3  3.0
4  4.0
df = df.astype({'a':int})
print(df)
   a
0  0
1  1
2  2
3  3
4  4

You can apply this datatype to all columns as I did in the first, creating a float dataframe, but also limit the effect to certain columns, using a dictionary as I did to make the column an integer again.

Hope, that helps. Best, lepakk

Upvotes: 0

Related Questions