How to add another condition when using groupby.nth in python?

Question

I have a set of df as below:

Each member has a unique Member Code. My aim is to extract the 1st and 5th visits of every day of every member into a new data frame without using for loop. Any ideas how should I do that?

Part of the sample data on 2018-09-02 is:

I tried to use: df.groupby(df["DateTime"].dt.date).nth([0,4]) but the output is :

As you can see, the third and the fifth row are from the same member. Since the member are the same, so how should I do in order to take the next member which is row number 153 with member code 17753?

Here is the small dataframe as an example:

DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
            "2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,10785,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)

df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")

df1

So based on this example, I want to extract the 2nd and the 4th visit. So, the expected output should be a dataframe consists of the rows with indexes of 1,3,6 and 9 (since the member in row with index 8 is the same as the row with index 6)

EDITED:

DataFrame:

DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
            "2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,13596,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)

df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")

df1

Solution:

vals = [1, 2]
df_new = df1.groupby(df1["DateTime"].dt.date).nth(vals)
dates = df1["DateTime"].dt.date
mask = df1['Member Code'].eq(dates.map(df1['Member Code'].groupby(dates).nth(vals[1])))
mask1 = df1['Member Code'].groupby(dates).cumcount().ne(vals[0])
df = df1.mask(mask & mask1).groupby(dates).bfill()
df_new = df.groupby(df1["DateTime"].dt.date).nth(vals)
df_new

Answer:

              DateTime            DateTime  Duration  Member Code
DateTime                                                         
2018-09-01  2018-09-01 2018-09-01 00:35:00      14.0      17630.0
2018-09-01  2018-09-01 2018-09-01 01:39:00      15.0      17630.0
2018-09-02  2018-09-02 2018-09-02 00:51:00       2.0      13596.0
2018-09-02  2018-09-02 2018-09-02 02:16:00      10.0      14942.0

jezrael · Accepted Answer

Use:

DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
            "2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,13596,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)

df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")
print (df1)
             DateTime  Duration  Member Code
0 2018-09-01 00:01:00       3.0        13595
1 2018-09-01 00:35:00      14.0        17630
2 2018-09-01 01:14:00       9.0        13522
3 2018-09-01 01:39:00      15.0        17630
4 2018-09-01 01:46:00       7.5        14942
5 2018-09-02 00:13:00       1.0        17022
6 2018-09-02 00:51:00       2.0        13596
7 2018-09-02 01:23:00       8.0        13596
8 2018-09-02 02:06:00       3.0        13596
9 2018-09-02 02:16:00      10.0        14942

First compare Member Code of first nth value by all values of column per groups to mask and also exclude first nth value by test by counter with cumcount, pass to DataFrame.mask and back filling missing values per groups, last use original solution:

vals = [1, 2]
dates = df1["DateTime"].dt.date
mask = df1['Member Code'].eq(dates.map(df1['Member Code'].groupby(dates).nth(vals[0])))
mask1 = df1['Member Code'].groupby(dates).cumcount().ne(vals[0])
df = df1.mask(mask & mask1).groupby(dates).bfill()
print (df)
             DateTime  Duration  Member Code
0 2018-09-01 00:01:00       3.0      13595.0
1 2018-09-01 00:35:00      14.0      17630.0
2 2018-09-01 01:14:00       9.0      13522.0
3 2018-09-01 01:46:00       7.5      14942.0
4 2018-09-01 01:46:00       7.5      14942.0
5 2018-09-02 00:13:00       1.0      17022.0
6 2018-09-02 00:51:00       2.0      13596.0
7 2018-09-02 02:16:00      10.0      14942.0
8 2018-09-02 02:16:00      10.0      14942.0
9 2018-09-02 02:16:00      10.0      14942.0

df_new = df.groupby(df1["DateTime"].dt.date).nth(vals)
print (df_new)
                      DateTime  Duration  Member Code
DateTime                                             
2018-09-01 2018-09-01 00:35:00      14.0      17630.0
2018-09-01 2018-09-01 01:14:00       9.0      13522.0
2018-09-02 2018-09-02 00:51:00       2.0      13596.0
2018-09-02 2018-09-02 02:16:00      10.0      14942.0

How to add another condition when using groupby.nth in python?

Answers (1)

Related Questions