Reputation:
Assuming I have a dataframe as follows:
df = pd.DataFrame({ 'ids' : ['1', '1', '1', '1', '2', '2', '2', '3', '3'],
'values' : ['5', '8', '7', '12', '2', '1', '3', '15', '4']
}, dtype='int32')
ids values
1 5
1 7
1 8
1 12
2 1
2 3
2 2
3 4
3 15
What I would like to do is to loop over the values
column and check which values are greater than 6 and the corresponding id from the ids
column must be appended into an empty list.
Even if an id (say 3) has multiple values and out of those multiple values (4 and 15), only one value is greater than 6, I would like the corresponding id to be appended into the list.
Example: Assuming we run a loop over the above mentioned dataframe df, I would like the output as follows:
more = [1, 3]
less = [2]
with more =[]
and less = []
being pre-initialized empty lists
What I have so far: I tried implementing the same, but surely I am doing some mistake. The code I have:
less = []
more = []
for value in df['values']:
for id in df['ids']:
if (value > 6):
more.append(id)
else:
less.append(id)
Upvotes: 1
Views: 82
Reputation: 930
You can use dataframe indexing to scrape out all those indices which are greater than 6 and create a set of unique indices using:
setA = set(df[df['values'] > 6]['ids'])
This will create a set of all indices in the dataframe:
setB = set(df['ids'])
Now,
more = list(setA)
and for less
, take the set difference:
less = list(setB.difference(setA))
That's it!
Upvotes: 0
Reputation: 18647
Use groupby
and boolean indexing to create your lists. This will be much faster than looping:
g = df.groupby('ids')['values'].max()
mask = g.gt(6)
more = g[mask].index.tolist()
less = g[~mask].index.tolist()
print(more)
print(less)
[1, 3]
[2]
Upvotes: 4