Reputation: 1218
I have a Pandas DataFrame that looks like this:
AA, BB, CC, _id1, _id2, _id3
"foo", "foo", "foo", "1A1A1A", "1A1A1A", "1A1A1A"
"foo", "foo", "foo", "2B2B2B", "4L4L4L", "2B2B2B"
I want to isolate rows with _id
values that are not all identical. In the above example, it would isolate the second row, and ignore the first.
I do not know how many _id
columns will exist (and need checking) in any given run, and their names will vary (but will be available in as a list of strings: id_column_names
. The names will also always have _id
prefix.
Is there any table-wise (or at least series-wise) operation that is reasonably efficient here?
I can of course write some function (to iterate and the relevant row values) and apply()
it row-wise, but is there a better option?
Upvotes: 2
Views: 186
Reputation: 863266
First filter _id
columns by DataFrame.filter
and then test if same values by DataFrame.eq
with first column with DataFrame.all
:
df1 = df.filter(regex='^_id')
df = df[~df1.eq(df1.iloc[:, 0], axis=0).all(axis=1)]
print (df)
AA BB CC _id1 _id2 _id3
1 foo foo foo 2B2B2B 4L4L4L 2B2B2B
Upvotes: 2
Reputation: 15498
Yeah, use df.filter
:
df.filter(regex=r'^[_]')
To remove these:
df.filter(regex=r'^[^_]')
Upvotes: 0