Reputation: 571
I have a dataframe. It contains df['article_id']
and df['user_id']
.
I also have a numpy array(or a list. I figured np array would be faster for this). Which contains an article_id and a user_id.
The point is to compare the df with the np array so I can filter out duplicate entries. Both user_id and article_id need to be the same value. So the idea is:
if df['article_id'] == nparray[:,0] & df['user_id'] == nparray[:,1]:
remove the row from the dataframe
Here's what the df & np.array/list look like(as of now there is only 1 user_id but there will be more later). So if the np.array contains the same values from the dataframe, the dataframe rows should be deleted.:
array([[1127087222, 1],
[1202623831, 1],
[1747352473, 1],
[1748645480, 1],
[1759957596, 1],
[1811054956, 1]])
user_id article_id date_saved
0 1 2579244390 2019-05-09 10:46:23
1 1 2580336884 2019-05-09 10:46:22
2 1 1202623831 2019-05-09 10:46:20
3 1 2450784233 2019-01-11 12:36:44
4 1 1747352473 2019-01-03 21:38:34
Desired output:
user_id article_id date_saved
0 1 2579244390 2019-05-09 10:46:23
1 1 2580336884 2019-05-09 10:46:22
3 1 2450784233 2019-01-11 12:36:44
How can I achieve this?
Upvotes: 0
Views: 407
Reputation: 25259
After your clarification. You may achieve your desired output using np.isin
and negate operator '~' as follows:
df[~np.isin(df[['user_id', 'article_id']], nparray)]
Out[17]:
user_id article_id date_saved
0 1 2579244390 2019-05-09 10:46:23
1 1 2580336884 2019-05-09 10:46:22
3 1 2450784233 2019-01-11 12:36:44
Upvotes: 1