How to filter and group pandas DataFrame to get count for combination of two columns

Question

I am sorry I could not fit the whole problem in the Title in a concise way. Pardon my English. I will explain my problem with an example.

Say I have this dataset:

dff = pd.DataFrame(np.array([["2020-11-13", 0, 3,4], ["2020-10-11", 1, 3,4], ["2020-11-13", 2, 1,4],
                             ["2020-11-14", 0, 3,4], ["2020-11-13", 1, 5,4], 
                             ["2020-11-14", 2, 2,4],["2020-11-12", 1, 1,4],["2020-11-14", 1, 2,4],["2020-11-15", 2, 5,4],
                             ["2020-11-11", 0, 1,2],["2020-11-15", 1, 1,2],
                             ["2020-11-18", 1, 2,4],["2020-11-17", 0, 1,2],["2020-11-20", 0, 3,4]]), columns=['Timestamp', 'ID', 'Name', "slot"])

I want to have a count for each Name and slot combination but disregard multiple timeseries value for same ID. For example, if I simply group by Name and slot I get:

dff.groupby(['Name', "slot"]).Timestamp.count().reset_index(name="count")


  Name slot count
    1   2   3
    1   4   2
    2   4   3
    3   4   4
    5   4   2

However, for ID == 0, there are two combinations for name == 1 and slot == 2, so instead of 3 I want the count to be 2.

This is the table I would ideally want.

  Name slot count
    1   2   2
    1   4   2
    2   4   2
    3   4   2
    5   4   2

I tried:

filter_one = dff.groupby(['ID']).Timestamp.transform(min)
dff1 = dff.loc[dff.Timestamp == filter_one]
dff1.groupby(['Name', "slot"]).Timestamp.count().reset_index(name="count")

But this gives me:

  Name slot count
    1   2   1
    1   4   1
    3   4   1

It also does not work if I drop duplicates for ID.

Andrej Kesely · Accepted Answer

x = dff.groupby(["Name", "slot"]).ID.nunique().reset_index(name="count")
print(x)

Prints:

  Name slot  count
0    1    2      2
1    1    4      2
2    2    4      2
3    3    4      2
4    5    4      2

How to filter and group pandas DataFrame to get count for combination of two columns

Answers (1)

Related Questions