Reputation: 425
I have a set of df
as below:
Each member has a unique Member Code. My aim is to extract the 1st and 5th visits of every day of every member into a new data frame without using for loop
. Any ideas how should I do that?
Part of the sample data on 2018-09-02 is:
I tried to use: df.groupby(df["DateTime"].dt.date).nth([0,4])
but the output is :
As you can see, the third and the fifth row are from the same member. Since the member are the same, so how should I do in order to take the next member which is row number 153 with member code 17753?
Here is the small dataframe as an example:
DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
"2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,10785,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)
df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")
df1
So based on this example, I want to extract the 2nd and the 4th visit. So, the expected output should be a dataframe consists of the rows with indexes of 1,3,6 and 9 (since the member in row with index 8 is the same as the row with index 6)
EDITED:
DataFrame:
DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
"2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,13596,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)
df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")
df1
Solution:
vals = [1, 2]
df_new = df1.groupby(df1["DateTime"].dt.date).nth(vals)
dates = df1["DateTime"].dt.date
mask = df1['Member Code'].eq(dates.map(df1['Member Code'].groupby(dates).nth(vals[1])))
mask1 = df1['Member Code'].groupby(dates).cumcount().ne(vals[0])
df = df1.mask(mask & mask1).groupby(dates).bfill()
df_new = df.groupby(df1["DateTime"].dt.date).nth(vals)
df_new
Answer:
DateTime DateTime Duration Member Code
DateTime
2018-09-01 2018-09-01 2018-09-01 00:35:00 14.0 17630.0
2018-09-01 2018-09-01 2018-09-01 01:39:00 15.0 17630.0
2018-09-02 2018-09-02 2018-09-02 00:51:00 2.0 13596.0
2018-09-02 2018-09-02 2018-09-02 02:16:00 10.0 14942.0
Upvotes: 1
Views: 252
Reputation: 862771
Use:
DateTime = ["2018-09-01 00:01:00","2018-09-01 00:35:00","2018-09-01 01:14:00","2018-09-01 01:39:00","2018-09-01 01:46:00",
"2018-09-02 00:13:00","2018-09-02 00:51:00","2018-09-02 01:23:00","2018-09-02 02:06:00","2018-09-02 02:16:00"]
Duration = [3.0,14.0,9.0,15.0,7.5,1.0,2.0,8.0,3.0,10.0]
Member_Code = [13595,17630,13522,17630,14942,17022,13596,13596,13596,14942]
data = {"DateTime":DateTime, "Duration":Duration, "Member Code":Member_Code}
df1 = pd.DataFrame(data)
df1["DateTime"] = pd.to_datetime(df1["DateTime"], format = "%Y/%m/%d %H:%M")
print (df1)
DateTime Duration Member Code
0 2018-09-01 00:01:00 3.0 13595
1 2018-09-01 00:35:00 14.0 17630
2 2018-09-01 01:14:00 9.0 13522
3 2018-09-01 01:39:00 15.0 17630
4 2018-09-01 01:46:00 7.5 14942
5 2018-09-02 00:13:00 1.0 17022
6 2018-09-02 00:51:00 2.0 13596
7 2018-09-02 01:23:00 8.0 13596
8 2018-09-02 02:06:00 3.0 13596
9 2018-09-02 02:16:00 10.0 14942
First compare Member Code
of first nth
value by all values of column per groups to mask
and also exclude first nth
value by test by counter with cumcount
, pass to DataFrame.mask
and back filling missing values per groups, last use original solution:
vals = [1, 2]
dates = df1["DateTime"].dt.date
mask = df1['Member Code'].eq(dates.map(df1['Member Code'].groupby(dates).nth(vals[0])))
mask1 = df1['Member Code'].groupby(dates).cumcount().ne(vals[0])
df = df1.mask(mask & mask1).groupby(dates).bfill()
print (df)
DateTime Duration Member Code
0 2018-09-01 00:01:00 3.0 13595.0
1 2018-09-01 00:35:00 14.0 17630.0
2 2018-09-01 01:14:00 9.0 13522.0
3 2018-09-01 01:46:00 7.5 14942.0
4 2018-09-01 01:46:00 7.5 14942.0
5 2018-09-02 00:13:00 1.0 17022.0
6 2018-09-02 00:51:00 2.0 13596.0
7 2018-09-02 02:16:00 10.0 14942.0
8 2018-09-02 02:16:00 10.0 14942.0
9 2018-09-02 02:16:00 10.0 14942.0
df_new = df.groupby(df1["DateTime"].dt.date).nth(vals)
print (df_new)
DateTime Duration Member Code
DateTime
2018-09-01 2018-09-01 00:35:00 14.0 17630.0
2018-09-01 2018-09-01 01:14:00 9.0 13522.0
2018-09-02 2018-09-02 00:51:00 2.0 13596.0
2018-09-02 2018-09-02 02:16:00 10.0 14942.0
Upvotes: 1