ger.code
ger.code

Reputation: 13

Pandas and lists efficiency problem? Script takes too long

I'm kind of new to python and pandas. I have a csv with aroun 100k rows, with only three interest columns:

idd date prod
1 201601 1000
1 200605 2000
2 200102 1500
2 200903 1200
3 ....... . .......

I needed to group by idd, order by date (year) and then transpose the 'prod column so the first existing 'prod' value for each idd sorted by date ends up in the first column after idd, dropping the date value. In my example it would be this:

idd '1' '2' '3'
1 2000 1000 ...
2 1500 1200 ...
3 ... ..... ...

I also filtered for idds which have more than "nrows" reported values, since I am not interested in idds that have lesser than a certain value. Since I have read that recorring groups made by groupby is not efficient, I made a list of names resulting of groupby and made the queries to the original dataframe, but nevertheless it takes too long (like 5 minutes) to run. Maybe I am doing something wrong? I tried to use objects at minimum, loop using iloc and for loops to increase efficiency and use list of names instead of "get_group" but maybe I am missing something. Here is my code:

nrows = 36
for name in grouped_df.groups.keys():
    for i in range(0, len(origin_df[origin_df.idd == name]['idd'])):

        if len(origin_df[origin_df.idd == name]['idd']) >= nrows:

            aux_df = origin_df[origin_df.idd == name]
            aux_df.sort_values(by=['date'], inplace=True)
            idd = name
            prod = aux_df.iloc[i, 1]
            new_df.loc[idd, i + 1] = prod
            new_df.loc[idd, 'idd'] = idpozo

This is my first question in this page, so if I made some styling errors please forgive me, and all suggestions are welcome!!! Thanks in advance :)

Upvotes: 1

Views: 54

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Try:

df.set_index(['idd', df.groupby('idd').cumcount() + 1])['prod'].unstack()

Output:

        1     2
idd            
1    1000  2000
2    1500  1200

Upvotes: 1

Related Questions