Dervin Thunk
Dervin Thunk

Reputation: 20150

Translating code to numpy for better performance

I have this:

partial = {}
for d in devs["d"]:
    for k in a1km:
        total = len(cp[(cp["r"]==d) & (cp["s"]==k)])
        partial.update({str(d)+str(k): total})

Variables cp and devs are pandas dataframes, and a1km is a dictionary that contains a site, and all sites 1km from it (pre calculated). The output I'm after would be, for each d and for each site k, all the records in the cp dataframe that matches the query, stored total, so:

d, k, total

I've never worked with numpy, and I'm trying to learn as fast as I can, but the library is just too big for me to process, given time considerations in my lab. So my question is, how do I "translate" the code below to numpy to imporve performance?

Upvotes: 0

Views: 59

Answers (1)

FBruzzesi
FBruzzesi

Reputation: 6505

You can filter the dataframe and use pandas.DataFrame.groupby:

tmp = cp[(cp['r'].isin(devs['d'].unique()) & (cp['s'].isin(a1km))]

result_df = tmp.groupby(['r','s']).size()

Remark that this can be quite slow as well.

Then to make it into a dictionary:

partial = {str(k[0]) + str(k[1]): v for k,v in result_df.to_dict().items()}

Upvotes: 1

Related Questions