Bikash Jha
Bikash Jha

Reputation: 11

Subsetting the data

I have dataframe with 3 columns as shown here:

ID Date Page
1 1 A
1 1 C
1 1 B
2 1 B
2 2 C
2 2 A
import pandas as pd
data = {
'ID': [1,1,1,2,2,2],
'Date' : [1,1,1,1,2,2],
'Page' : ['A', 'C', 'B', 'B', 'C', 'A']
       }
    
df = pd.DataFrame(data)
print(df)

I want customers(ID) who have visited page B from page A by ignoring other pages and dates. Like in the given data ID 2 visited from page B to page A(ignore any other page in between.)

Expected output:

ID 
2.  

Upvotes: 1

Views: 56

Answers (1)

Yanirmr
Yanirmr

Reputation: 1032

If I understand you, I think it will help you to implement as follows:

import pandas as pd
data = {
'ID': [1,1,1,2,2,2],
'Date' : [1,1,1,1,2,2],
'Page' : ['A', 'C', 'B', 'B', 'C', 'A']
}

df = pd.DataFrame(data)

# remove all records that not A or B
df = df[df['Page'].isin(['A', 'B'])]

# sort the values by ID at first and then by Date
df.sort_values(["ID", "Date"]) 

# create new dataframe with the A and B order 
concat_df = df.groupby(["ID"])['Page'].apply(','.join).reset_index()

# filter the cases that B visited first
print(concat_df[concat_df['Page'] == "B,A"])

Upvotes: 2

Related Questions