Reputation: 1
I'm trying to go row by row in a dataframe and delete any rows that have the same 'hole_ID' but keep the original row. So that the nearest neighbour only searches in different holes. Here's what I have so far:
import pandas as pd
s1 = StringIO(u'''east,north,elev,hole_ID
11,11,5,A
51,51,6,A
61,61,11,A
21,21,2,B
31,31,3,B
71,71,3,B
81,81,4,B''')
df2 = pd.read_csv(s1)
for idx,row in df2.iterrows():
dftype= df2.drop_duplicates(subset=['hole_ID'], keep='first')
This is what I get:
Out[20]:
east north elev hole_ID
0 11 11 5 A
3 21 21 2 B
And this is what I want to get:
Out[18]:
east north elev hole_ID
0 11 11 5 A
3 21 21 2 B
4 31 31 3 B
5 71 71 3 B
6 81 81 4 B
So for row 1, all the other rows with the same hole_ID ('A') are dropped.
EDIT: I need to do this for every row in the original data frame to perform a nearest neighbour calculation where the hole_ID's do not match.
Thanks in advance.
Upvotes: 0
Views: 139
Reputation: 30930
I would create a function. Use Series.isin
to be able to select different ID
def remove_by_hole_ID(df,hole_ID):
if not isinstance(hole_ID,list):
hole_ID = [hole_ID]
m = df['hole_ID'].isin(hole_ID)
return pd.concat([df[m].drop_duplicates(subset = 'hole_ID'),df[~m]],sort = True)
print(remove_by_hole_ID(df,'A'))
east elev hole_ID north
0 11 5 A 11
3 21 2 B 21
4 31 3 B 31
5 71 3 B 71
6 81 4 B 81
print(remove_by_hole_ID(df,['A','B']))
east elev hole_ID north
0 11 5 A 11
3 21 2 B 21
Upvotes: 1
Reputation: 88276
If you only want to drop duplicates where hole_ID
is A
, you can pd.concat
on a one side the dataframe indexed when that is true and dropping duplicates, and on the other the rest of the cases:
pd.concat([
df2[df2.hole_ID.eq('A')].drop_duplicates(subset=['hole_ID'], keep='first'),
df2[df2.hole_ID.ne('A')]],
axis=0)
east north elev hole_ID
0 11 11 5 A
3 21 21 2 B
4 31 31 3 B
5 71 71 3 B
6 81 81 4 B
Upvotes: 1