Reputation: 503
I am sorry I could not fit the whole problem in the Title in a concise way. Pardon my English. I will explain my problem with an example.
Say I have this dataset:
dff = pd.DataFrame(np.array([["2020-11-13", 0, 3,4], ["2020-10-11", 1, 3,4], ["2020-11-13", 2, 1,4],
["2020-11-14", 0, 3,4], ["2020-11-13", 1, 5,4],
["2020-11-14", 2, 2,4],["2020-11-12", 1, 1,4],["2020-11-14", 1, 2,4],["2020-11-15", 2, 5,4],
["2020-11-11", 0, 1,2],["2020-11-15", 1, 1,2],
["2020-11-18", 1, 2,4],["2020-11-17", 0, 1,2],["2020-11-20", 0, 3,4]]), columns=['Timestamp', 'ID', 'Name', "slot"])
I want to have a count for each Name
and slot
combination but disregard multiple timeseries value for same ID. For example, if I simply group by Name
and slot
I get:
dff.groupby(['Name', "slot"]).Timestamp.count().reset_index(name="count")
Name slot count
1 2 3
1 4 2
2 4 3
3 4 4
5 4 2
However, for ID == 0
, there are two combinations for name == 1
and slot == 2
, so instead of 3
I want the count to be 2
.
This is the table I would ideally want.
Name slot count
1 2 2
1 4 2
2 4 2
3 4 2
5 4 2
I tried:
filter_one = dff.groupby(['ID']).Timestamp.transform(min)
dff1 = dff.loc[dff.Timestamp == filter_one]
dff1.groupby(['Name', "slot"]).Timestamp.count().reset_index(name="count")
But this gives me:
Name slot count
1 2 1
1 4 1
3 4 1
It also does not work if I drop duplicates for ID
.
Upvotes: 0
Views: 123
Reputation: 195418
x = dff.groupby(["Name", "slot"]).ID.nunique().reset_index(name="count")
print(x)
Prints:
Name slot count
0 1 2 2
1 1 4 2
2 2 4 2
3 3 4 2
4 5 4 2
Upvotes: 3