Reputation: 859
There are a lot of questions similar to this, but none asking this specific question, and I can't figure out how to solve it.
Assume I have a dataframe df
that looks like the one shown below, and assume each column can have other substrings (though, the substrings shown below are all unique to one column per row, eg if one column has 'Coin' in it, no other other column in that row can contain the substring 'Coin').
This example shows whole strings, but assume that a row with 'Coin' in it could actually be 'Coin (quarters)', such that it must search out the substring 'Coin' and not assume 'Coin' is the whole string.
>>> df
1 2 3 4 cap
0 Coin Bill Charge 72
1 Coin 13
2 Charge Coin 1
3 Bill Other Coin Charge 57
4 Charge Bill 11
5 Coin Charge 8
6 Other Bill Charge Coin 99
7 Charge 45
8 Charge Coin 15
9 Bill Other Coin 71
What I want to do is select rows if and only if columns contain 'Coin' and 'Charge' but none of the other unique substrings. Essentially, if and only if a column contains 'Coin' and a column contains 'Charge' and the rest of the columns DO NOT contain 'Bill' or 'Other' or they have an empty string.
So in the example above, the resulting dataframe would be a subset of df
:
>>> df_res
1 2 3 4 cap
2 Charge Coin 1
5 Coin Charge 8
8 Charge Coin 15
Upvotes: 1
Views: 67
Reputation: 195408
Try:
x = df[
df.apply(
lambda x: "Coin" in (v := x.values)
and "Charge" in v
and "Bill" not in v
and "Other" not in v,
axis=1,
)
]
print(x)
Prints:
1 2 3 4 cap
2 Charge Coin 1
5 Coin Charge 8
8 Charge Coin 15
EDIT: To search substrings:
x = df[
df.apply(
lambda x: any("Coin" in i for i in x if isinstance(i, str))
and any("Charge" in i for i in x if isinstance(i, str))
and not any("Bill" in i for i in x if isinstance(i, str))
and not any("Other" in i for i in x if isinstance(i, str)),
axis=1,
)
]
print(x)
Prints:
1 2 3 4 cap
2 Charge Coin(quarters) 1
5 Coin Charge 8
8 Charge Coin 15
Upvotes: 2