Reputation: 1
I have this code that I wrote and it's taking too long to run. I was advised to vectorize this operation but so far I have found only multiplication examples. Here is my code:
my_dict = {}
for i in list(df.index):
my_dict[i] = myClass(df.loc[i, 'name'])
my_dict[i].class_method({'col1': df.loc[i, 'col1']})
my_dict[i].class_method({'col2': df.loc[i, 'col2']})
...
and so on until 'col17'. Someone reviewed my code and said to 'use the fact that df is a dataframe and not loop through and don't use the expensive .loc() operation'
The only thing I could come up with is:
my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}
for i in list(df.index):
my_dict[i] = myClass(df.loc[i, 'name'])
for col in my_list:
my_dict[i].class_method({col: df.loc[i, col})
but this is not really vectorizing anything... are there any secret ways around pandas vectorization that I don't know about?
Upvotes: 0
Views: 56
Reputation: 8277
.loc
can be expensive as it needs to look up if you are passing a slice or an iterable over keys. Converting your dataframe to a dict of dict should bring faster lookups:
my_list = ['col1', 'col2', ..., 'col17']
my_dict = {}
for row_key, row in df.T.to_dict().items():
my_dict[row_key] = myClass(row['name'])
for col in my_list:
my_dict[row_key].class_method({col: row[col})
Upvotes: 0