Reputation: 496
I want to select all rows in a dataframe which contain values defined in a list. I've got two approaches which both do not work as expected/wanted.
My dataframe looks something like this:
Timestamp DEVICE READING VALUE
1 | DEV1 | READ1 | randomvalue
2 | DEV1 | READ2 | randomvalue
3 | DEV2 | READ1 | randomvalue
4 | DEV2 | READ2 | randomvalue
5 | DEV3 | READ1 | randomvalue
and I've got the list (ls) like follows:
[[DEV1, READ1], [DEV1, READ2], [DEV2,READ1]]
In this scenario I want to remove line 4
and 5
:
My first approach was:
df = df[(df['DEVICE']. isin([ls[i][0] for i in range(len(ls))])) &
(df['READING'].isin([ls[k][1] for k in range(len(ls))]))]
The problem with this one is obviously, that it does not remove line 4, because DEV2 has the READING READ2, but it should remove it.
My second approach was:
df = df[(df[['DEVICE','READING']].isin({'DEVICE': [ls[i][0] for i in range(len(ls))],
'READING': [ls[i][1] for i in range(len(ls))] }))]
This one selects the correct rows but it does not remove the other rows. Instead it sets every other cell to NaN, including the VALUE ROW, which i do want to keep. And It does not accumulate both so row 4 looks like 4 |DEV2|NaN|NaN
What would be the easiest or best way, to solve this problem? Can you help me?
~Fabian
Upvotes: 8
Views: 13843
Reputation: 911
You can use a multi-index to solve this problem.
values = [['DEV1', 'READ1'], ['DEV1', 'READ2'], ['DEV2', 'READ1']]
# DataFrame.loc requires tuples for multi-index lookups
index_values = [tuple(v) for v in values]
filtered = df.set_index(['DEVICE', 'READING']).loc[index_values].reset_index()
print(filtered)
DEVICE READING Timestamp VALUE
0 DEV1 READ1 1 randomvalue
1 DEV1 READ2 2 randomvalue
2 DEV2 READ1 3 randomvalue
Upvotes: 5
Reputation: 332
This should do what you want
import pandas as pd
df = pd.DataFrame({'a':[1,1,0,0,1], 'b':[0,0,1,0,1]})
keepers = [[0,0],[1,1]]
df = df[df.apply(lambda row: [row['a'], row['b']] in keepers, axis=1)]
Upvotes: 1
Reputation: 38415
You can convert the list to list of tuples. Convert the required columns in dataframe to tuples and use isin
l = [['DEV1', 'READ1'], ['DEV1', 'READ2'], ['DEV2','READ1']]
l = [tuple(i) for i in l]
df[df[['DEVICE', 'READING']].apply(tuple, axis = 1).isin(l)]
You get
Timestamp DEVICE READING VALUE
0 1 DEV1 READ1 randomvalue
1 2 DEV1 READ2 randomvalue
2 3 DEV2 READ1 randomvalue
Upvotes: 15
Reputation: 11100
Any reason you don't do it like this?
df.drop([4,5],axis=0,inplace=True)
Upvotes: -1