KubiK888
KubiK888

Reputation: 4723

Is it necessary or beneficial to convert pandas column from object to string or int/float type?

I have a pandas df with two variables:

id    name
011    Peter Parker
022    Warners Brother
101    Bruce Wayne

Currently both of them are of object type.

Say I want to create smaller dataframes by filtering with some conditions

df_small = df.loc[df['id']=='011']
df_small2 = df.loc[df['name']=='Peter Parker']

I have thought of and seen people converting the object-type column into other specific data type. My question, do I need to do that at all if I can filter them based on string comparison (as above) already? What are the benefits of converting them into a specific string or int/float type?

Upvotes: 2

Views: 907

Answers (1)

sacuL
sacuL

Reputation: 51335

You asked the benefits of converting from string or object dtypes. There are at least 2 I can think of right off the bat. Take the following dataframe for example:

df = pd.DataFrame({'int_col':np.random.randint(0,10,10000), 'str_col':np.random.choice(list('1234567980'), 10000)})

>>> df.head()
   int_col str_col
0        7       0
1        0       1
2        1       8
3        6       1
4        6       0

This dataframe comprises 10000 rows, and has one int column and one object (i.e. string) column for showing.

Memory advantage:

The integer column takes a lot less memory than the object column:

>>> import sys
>>> sys.getsizeof(df['int_col'])
80104
>>> sys.getsizeof(df['str_col'])
660104

Speed advantage:

Since your example is about filtering, take a look at the speed difference when filtering on integers instead of strings:

import timeit

def filter_int(df=df):
    return df.loc[df.int_col == 1]


def filter_str(df=df):
    return df.loc[df.str_col == '1']

>>> timeit.timeit(filter_int, number=100) / 100
0.0006298311000864488
>>> timeit.timeit(filter_str, number=100) / 100
0.0016585511100129225

This type of speed difference could potentially speed up your code significantly in some cases.

Upvotes: 3

Related Questions