How to append a list after looping over a dataframe column?

Question

Assuming I have a dataframe as follows:

df = pd.DataFrame({ 'ids' : ['1', '1', '1', '1', '2', '2', '2', '3', '3'],
        'values' : ['5', '8', '7', '12', '2', '1', '3', '15', '4']
        }, dtype='int32')



ids values
1   5
1   7
1   8
1   12
2   1
2   3
2   2
3   4
3   15

What I would like to do is to loop over the values column and check which values are greater than 6 and the corresponding id from the ids column must be appended into an empty list.

Even if an id (say 3) has multiple values and out of those multiple values (4 and 15), only one value is greater than 6, I would like the corresponding id to be appended into the list.

Example: Assuming we run a loop over the above mentioned dataframe df, I would like the output as follows:

more = [1, 3]
less = [2]

with more =[] and less = [] being pre-initialized empty lists

What I have so far: I tried implementing the same, but surely I am doing some mistake. The code I have:

less = []
more = []
for value in df['values']:
    for id in df['ids']:
        if (value > 6):
            more.append(id)
        else:
            less.append(id)

Chris Adams · Accepted Answer

Use groupby and boolean indexing to create your lists. This will be much faster than looping:

g = df.groupby('ids')['values'].max()
mask = g.gt(6)
more = g[mask].index.tolist()
less = g[~mask].index.tolist()

print(more)
print(less)

[1, 3]
[2]

How to append a list after looping over a dataframe column?

Answers (2)

Related Questions