Reputation: 341
I have a problem where i need to count already used ids. In my data set there are attributes: id, time, Bi
witch looks something like this:
id time Bi | wanted_results used
1 3 NAN | 0 []
1 3 1 | 1 [1]
1 2 NAN | 1 [1]
2 2 1 | 2 [1, 2]
2 1 1 | 2 [1, 2]
2 1 1 | 2 [1, 2]
Attribute description:
id
- represents what we count time
- is used for timeline, witch
goes from n to 0
Bi
- represents if id was used in that timeused
- stands for representation of what was countedSo now i want unique already used ids as a count. How can i group data to store used ids, to get wanted results?
Thank you!
Upvotes: 0
Views: 39
Reputation: 2110
You can use a combination of expanding and apply.
df['id'].expanding().apply(lambda x: len(np.unique(x)))
This will return a Series with the results you want.
Upvotes: 2
Reputation: 7058
You can do this by iterating over the DataFrame
and adding the id
s to a set
df['wanted_result'] = 0
used_set = set()
for row in df.itertuples():
df.loc[row.Index, 'wanted_result'] = len(used_set)
used_set.add((row.id,))
Results in
id time Bi wanted_result
0 1 3 NAN 0
1 1 3 1 1
2 1 2 NAN 1
3 2 2 1 1
4 2 1 1 2
5 2 1 1 2
Upvotes: 0