Reputation: 11
I have dataframe with 3 columns as shown here:
ID | Date | Page |
---|---|---|
1 | 1 | A |
1 | 1 | C |
1 | 1 | B |
2 | 1 | B |
2 | 2 | C |
2 | 2 | A |
import pandas as pd
data = {
'ID': [1,1,1,2,2,2],
'Date' : [1,1,1,1,2,2],
'Page' : ['A', 'C', 'B', 'B', 'C', 'A']
}
df = pd.DataFrame(data)
print(df)
I want customers(ID
) who have visited page B
from page A
by ignoring other pages and dates. Like in the given data ID
2 visited from page B
to page A
(ignore any other page in between.)
Expected output:
ID
2.
Upvotes: 1
Views: 56
Reputation: 1032
If I understand you, I think it will help you to implement as follows:
import pandas as pd
data = {
'ID': [1,1,1,2,2,2],
'Date' : [1,1,1,1,2,2],
'Page' : ['A', 'C', 'B', 'B', 'C', 'A']
}
df = pd.DataFrame(data)
# remove all records that not A or B
df = df[df['Page'].isin(['A', 'B'])]
# sort the values by ID at first and then by Date
df.sort_values(["ID", "Date"])
# create new dataframe with the A and B order
concat_df = df.groupby(["ID"])['Page'].apply(','.join).reset_index()
# filter the cases that B visited first
print(concat_df[concat_df['Page'] == "B,A"])
Upvotes: 2