Reputation: 543
I'm new to pandas and I'm having problem with row selections from dataframe.
Following is my DataFrame :
Index Column1 Column2 Column3 Column4
0 1234 500 NEWYORK NY
1 5678 700 AUSTIN TX
2 1234 300 NEWYORK NY
3 8910 235 RICHMOND FL
I want to select rows that are having same value in column1,column 3 and column4(identical rows in terms of these 3 columns). So the output dataframe will contain rows with index 0 and 2.
Can any one help me with a step-by-step procedure for this custom selection.
Upvotes: 1
Views: 1355
Reputation: 543
Earler I was using following approach :
d = df.T.to_dict()
dup=[]
for i in d.keys():
for j in d.keys():
if i!=j:
if d[i]['column1']==agg_d[j]['column1'] and d[i]['column3']==d[j]['column3'] and d[i]['column3']==d[j]['column3']:
set(dup.append(k[i]['column1'])
dup_rows = df[df.loc['column1'].isin(dup)]
Upvotes: 0
Reputation: 402413
Use df.duplicated
as a mapper to index into df
:
c = ['Column1', 'Column3', 'Column4']
df = df[df[c].duplicated(keep=False)]
df
Index Column1 Column2 Column3 Column4
0 0 1234 500 NEWYORK NY
2 2 1234 300 NEWYORK NY
keep=False
will mark all duplicate rows for filtering.
Upvotes: 3