smallfox
smallfox

Reputation: 1

How to iterate over rows and drop all other rows where column matches?

I'm trying to go row by row in a dataframe and delete any rows that have the same 'hole_ID' but keep the original row. So that the nearest neighbour only searches in different holes. Here's what I have so far:

import pandas as pd
s1 = StringIO(u'''east,north,elev,hole_ID
11,11,5,A
51,51,6,A
61,61,11,A
21,21,2,B
31,31,3,B
71,71,3,B
81,81,4,B''')

df2 = pd.read_csv(s1)

for idx,row in df2.iterrows():
    dftype= df2.drop_duplicates(subset=['hole_ID'], keep='first')

This is what I get:

Out[20]: 
   east  north  elev hole_ID
0    11     11     5       A
3    21     21     2       B

And this is what I want to get:

Out[18]: 
   east  north  elev hole_ID
0    11     11     5       A
3    21     21     2       B
4    31     31     3       B
5    71     71     3       B
6    81     81     4       B

So for row 1, all the other rows with the same hole_ID ('A') are dropped.

EDIT: I need to do this for every row in the original data frame to perform a nearest neighbour calculation where the hole_ID's do not match.

Thanks in advance.

Upvotes: 0

Views: 139

Answers (2)

ansev
ansev

Reputation: 30930

I would create a function. Use Series.isin to be able to select different ID

def remove_by_hole_ID(df,hole_ID):
    if not isinstance(hole_ID,list):
        hole_ID = [hole_ID]
    m = df['hole_ID'].isin(hole_ID)
    return pd.concat([df[m].drop_duplicates(subset = 'hole_ID'),df[~m]],sort = True)

print(remove_by_hole_ID(df,'A'))
   east  elev hole_ID  north
0    11     5       A     11
3    21     2       B     21
4    31     3       B     31
5    71     3       B     71
6    81     4       B     81


print(remove_by_hole_ID(df,['A','B']))
   east  elev hole_ID  north
0    11     5       A     11
3    21     2       B     21

Upvotes: 1

yatu
yatu

Reputation: 88276

If you only want to drop duplicates where hole_ID is A, you can pd.concat on a one side the dataframe indexed when that is true and dropping duplicates, and on the other the rest of the cases:

pd.concat([ 
    df2[df2.hole_ID.eq('A')].drop_duplicates(subset=['hole_ID'], keep='first'),
    df2[df2.hole_ID.ne('A')]],
    axis=0)

    east  north  elev hole_ID
0    11     11     5       A
3    21     21     2       B
4    31     31     3       B
5    71     71     3       B
6    81     81     4       B

Upvotes: 1

Related Questions