Reputation: 8090
I'd like to extract those rows from df1
which are not existent in df2
(identity is the index). For the below example, I would expect the first row in df1
to be returned. Unfortunately, the result is empty.
import pandas as pd
df1 = pd.DataFrame({
'level-0': ['a', 'a', 'a', 'a', 'a', 'a'],
'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
'level-2': ['1', '1', '1', '1', '1', '1'],
'level-3': ['19', '20', '21', '22', '23', '24'],
'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB'],
'name': ['a', 'b', 'c', 'd', 'e', 'f']
})
df1 = df1.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)
df2 = pd.DataFrame({
'level-0': ['a', 'a', 'a', 'a', 'a', 'b'],
'level-1': ['s2', 's2', 's2', 's2', 's2', 's2'],
'level-2': ['1', '1', '1', '1', '1', '1'],
'level-3': ['19', '20', '21', '22', '23', '24'],
'level-4': ['HRB', 'HRB', 'HRB', 'HRB', 'HRB', 'HRB']
})
df2 = df2.set_index(['level-0', 'level-1', 'level-2', 'level-3', 'level-4'], drop=False)
# all indices that are in df1 but not in df2
df_unknown = df1[~df1.index.isin(df2.index)]
print df_unknown
What's wrong with the selection?
Update
I figured out what went wrong. The dataframes were read from an Excel file and some Series were interpreted as int
, while the dataframe to compare with had its columns already converted to str
. This resulted in different indices.
Upvotes: 0
Views: 358
Reputation: 32212
set_index
is not in place by default, so df1
and df2
still have their numeric index after the call. Do either
df2.set_index(..., inplace=True)
or
df2 = df2.set_index(...)
You will see that by far the most methods in pandas work that way.
Upvotes: 1