user8207612
user8207612

Reputation:

How to append a list after looping over a dataframe column?

Assuming I have a dataframe as follows:

df = pd.DataFrame({ 'ids' : ['1', '1', '1', '1', '2', '2', '2', '3', '3'],
        'values' : ['5', '8', '7', '12', '2', '1', '3', '15', '4']
        }, dtype='int32')



ids values
1   5
1   7
1   8
1   12
2   1
2   3
2   2
3   4
3   15

What I would like to do is to loop over the values column and check which values are greater than 6 and the corresponding id from the ids column must be appended into an empty list.

Even if an id (say 3) has multiple values and out of those multiple values (4 and 15), only one value is greater than 6, I would like the corresponding id to be appended into the list.

Example: Assuming we run a loop over the above mentioned dataframe df, I would like the output as follows:

more = [1, 3]
less = [2]

with more =[] and less = [] being pre-initialized empty lists

What I have so far: I tried implementing the same, but surely I am doing some mistake. The code I have:

less = []
more = []
for value in df['values']:
    for id in df['ids']:
        if (value > 6):
            more.append(id)
        else:
            less.append(id)

Upvotes: 1

Views: 82

Answers (2)

nikhilbalwani
nikhilbalwani

Reputation: 930

You can use dataframe indexing to scrape out all those indices which are greater than 6 and create a set of unique indices using:

setA = set(df[df['values'] > 6]['ids'])

This will create a set of all indices in the dataframe:

setB = set(df['ids'])

Now,

more = list(setA)

and for less, take the set difference:

less = list(setB.difference(setA))

That's it!

Upvotes: 0

Chris Adams
Chris Adams

Reputation: 18647

Use groupby and boolean indexing to create your lists. This will be much faster than looping:

g = df.groupby('ids')['values'].max()
mask = g.gt(6)
more = g[mask].index.tolist()
less = g[~mask].index.tolist()

print(more)
print(less)

[1, 3]
[2]

Upvotes: 4

Related Questions