Reputation: 196
Data frame 1
cust_id1 cust_id2
23 50
23 51
23 52
24 32
25 40
25 41
Dataframe 2
cust_id
50
51
52
32
40
FOR EACH cust_id1, check if cust_id2 of Dataframe 1 is present in cust_id of Dataframe 2. IF YES, THEN create a new data frame. Also, I further would like to add a new column count for cust_id2
Expected Output in a new data frame 3
cust_id1 cust_id2 count_cust_id2
23 50 1
23 51 1
23 52 1
24 32 1
25 40 1
This is what I have tried
uniq = df1['cust_id1'].dropna().unique()
cust_id_grouped = df1.groupby(['cust_id1'], as_index = False)
for i in uniq:
all_groups = cust_id_grouped.get_group(i)
count_cust_id2 = 0
for index, row in all_groups.iterrows():
if (row.cust_id1.isin(df2. cust_id)):
count_cust_id2 = count_cust_id2 + 1
Part 2 In addition to the above data frames, I have similar data frames.
Dataframe 4
cust_id1 cust_id2
23 50
23 51
23 52
24 32
25 40
25 41
26 50
Dataframe 5
cust_id
50
51
52
32
40
41
50
The Expected Output should update data frame 3,
cust_id1 cust_id2 count_cust_id2
23 50 2
23 51 2
23 52 2
24 32 2
25 40 2
25 41 1
26 50 1
I have multiple of these data frames and each of these groups(df1-df2) and (df4-df5) represents a single month. The count is actually for the month.
Upvotes: 0
Views: 59
Reputation: 862671
Use Series.isin
for test all values, testing per groups is not necessary. Last get counts by GroupBy.transform
and GroupBy.size
:
df3 = df1[df1.cust_id2.isin(df2.cust_id)].copy()
df3['count_cust_id2'] = df3.groupby('cust_id2')['cust_id2'].transform('size')
print (df3)
cust_id1 cust_id2 count_cust_id2
0 23 50 1
1 23 51 1
2 23 52 1
3 24 32 1
4 25 40 1
Upvotes: 1