Reputation: 329
I have a following data set from Table_Record
:
Seg_ID Lock_ID Code
111 100 1
222 121 2
333 341 2
444 100 1
555 100 1
666 341 2
777 554 4
888 332 5
I am using the sql query to find the Seg_IDs
where Lock_ID
is repeated:
Select Code,Lock_ID,Seg_ID from Table_Record group by Code, Lock_ID;
Seg_ID Lock_ID Code
111 100 1
444 100 1
555 100 1
222 121 2
333 341 2
666 341 2
777 554 4
888 332 5
How can I achieve the same using Pandas?
Excepted O/P from Pandas is:
eg.
Seg_ID (111,444,555) has Lock_id (1).
Seg_ID (222,333,666) has Lock_ID (2).
Upvotes: 3
Views: 1795
Reputation: 862511
First get all codes
by filtering only duplicated
values and then filter original DaatFrame
by boolean indexing
with isin
:
codes = df.loc[df.duplicated(['Lock_ID']), 'Code'].unique()
df1 = df[df['Code'].isin(codes)]
print (df1)
Seg_ID Lock_ID Code
0 111 100 1
1 222 121 2
2 333 341 2
3 444 100 1
4 555 100 1
5 666 341 2
Then groupby
with f-string
s:
for k, v in df1.groupby(['Code'])['Seg_ID']:
print (f'Seg_ID {tuple(v)} has Code ({k})')
Seg_ID (111, 444, 555) has Code (1)
Seg_ID (222, 333, 666) has Code (2)
If want output like DataFrame
use apply
with tuple
:
df2 = df1.groupby(['Code'])['Seg_ID'].apply(tuple).reset_index()
print (df2)
Code Seg_ID
0 1 (111, 444, 555)
1 2 (222, 333, 666)
Upvotes: 3
Reputation: 672
Simply use groupby. As I could understand from your code, you'd want:
grouped= df.groupby(['Code']['LockId'])
Upvotes: 1