Reputation: 323
I want to drop duplicated rows for a dataframe, based on the type of values of a column. For example, my dataframe is:
A B
3 4
3 4
3 5
yes 8
no 8
yes 8
If df['A']
is a number, I want to drop_duplicates()
.
If df['A']
is a string, I want to keep the duplicates.
So the desired result would be:
A B
3 4
3 5
yes 8
no 8
yes 8
Besides using for
loops, is there a Pythonic way to do that? thanks!
Upvotes: 1
Views: 275
Reputation: 54330
Create a new column C
: if A
columns is numeric, assign a common value in C
, otherwise assign a unique value in C
.
After that, just drop_duplicates
as normal.
Note: there is a nice isnumeric()
method for testing if a cell is number-like.
In [47]:
df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)
print df
A B C
0 3 4 1
1 3 4 1
2 3 5 1
3 yes 8 3
4 no 8 4
5 yes 8 5
In [48]:
print df.drop_duplicates()[['A', 'B']] #reset index if needed
A B
0 3 4
2 3 5
3 yes 8
4 no 8
5 yes 8
Upvotes: 3
Reputation: 8880
This solution is more verbose, but might be more flexible for more involved tests:
def true_if_number(x):
try:
int(x)
return True
except ValueError:
return False
rows_numeric = df['A'].apply(true_if_number)
df['A'][rows_numeric].drop_duplicates().append(df['A'][~rows_numeric])
Upvotes: 1