regina_adams
regina_adams

Reputation: 1

How to vectorize my code in pandas? it is long and inefficient

I have this code that I wrote and it's taking too long to run. I was advised to vectorize this operation but so far I have found only multiplication examples. Here is my code:

my_dict = {}
for i in list(df.index):
    my_dict[i] = myClass(df.loc[i, 'name'])
    my_dict[i].class_method({'col1': df.loc[i, 'col1']})
    my_dict[i].class_method({'col2': df.loc[i, 'col2']})
    ...

and so on until 'col17'. Someone reviewed my code and said to 'use the fact that df is a dataframe and not loop through and don't use the expensive .loc() operation'

The only thing I could come up with is:

my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}

for i in list(df.index):
    my_dict[i] = myClass(df.loc[i, 'name'])
        for col in my_list:
            my_dict[i].class_method({col: df.loc[i, col})
    

but this is not really vectorizing anything... are there any secret ways around pandas vectorization that I don't know about?

Upvotes: 0

Views: 56

Answers (1)

Learning is a mess
Learning is a mess

Reputation: 8277

.loc can be expensive as it needs to look up if you are passing a slice or an iterable over keys. Converting your dataframe to a dict of dict should bring faster lookups:

my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}

for row_key, row in df.T.to_dict().items():
    my_dict[row_key] = myClass(row['name'])
        for col in my_list:
            my_dict[row_key].class_method({col: row[col})

Upvotes: 0

Related Questions