Is there a way to increse the speed of the loop or a faster way to do the same thing without using for loop?

Question

I have a huge dataframe (4 million rows and 25 columns). I am trying to investigate 2 categorical columns. One of them has around 5000 levels (app_id) and the other has 50 levels (app_category).

I have seen that for for each level in app_id there is a unique value of app_category. How do I code to prove that?

I have tried something like this:

app_id_unique = list(train['app_id'].unique())

for unique in app_id_unique:
    train.loc[train['app_id'] == unique].app_category.nunique()

This code takes forever.

jezrael · Accepted Answer

I think you need groupby with nunique:

train.groupby('app_id').app_category.nunique()

Is there a way to increse the speed of the loop or a faster way to do the same thing without using for loop?

Answers (1)

Related Questions