Reputation: 8357
I have a two dimensional (or more) pandas DataFrame like this:
>>> import pandas as pd
>>> df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
>>> df
A B
0 0 1
1 2 3
2 4 5
Now suppose I have a numpy array like np.array([2,3])
and want to check if there is any row in df
that matches with the contents of my array. Here the answer should obviously true but eg. np.array([1,2])
should return false as there is no row with both 1 in column A and 2 in column B.
Sure this is easy but don't see it right now.
Upvotes: 83
Views: 187222
Reputation: 77
You can also convert the df
to a list of records and check that way. Each record is a row, represented by a dictionary of {col names: values}
records = df.to_dict(orient="records")
a = np.array([2, 3])
a_record = dict(zip(df.columns, a)) # turn your array into a dictionary record
a_record in records
or check from a list of tuples
tuples = list(df.itertuples(index=False))
a = np.array([2, 3])
a_tuple = tuple(a)
a_tuple in tuples
Upvotes: 0
Reputation: 44
If you want to return the row where the matches occurred:
resulting_row = df[(df['A'] == 2)&(df['B'] == 3)].values
Upvotes: 0
Reputation: 427
a simple solution with dictionary
def check_existance(dict_of_values, df):
v = df.iloc[:, 0] == df.iloc[:, 0]
for key, value in dict_of_values.items():
v &= (df[key] == value)
return v.any()
import pandas as pd
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
this_row_exists = {'A':2, 'B':3}
check_existance(this_row_exists, df)
# True
this_row_does_not_exist = {'A':2, 'B':5}
check_existance(this_row_does_not_exist, df)
# False
Upvotes: 1
Reputation:
To find rows where a single column equals a certain value:
df[df['column name'] == value]
To find rows where multiple columns equal different values, Note the inner ():
df[(df["Col1"] == Value1 & df["Col2"] == Value2 & ....)]
Upvotes: 2
Reputation: 800
An answer that works with larger dataframes so you don't need to manually check for each columns:
import pandas as pd
import numpy as np
#define variables
df = pd.DataFrame([[0,1],[2,3],[4,5]], columns=['A', 'B'])
a = np.array([2,3])
def check_if_np_array_is_in_df(df, a):
# transform a into a dataframe
da = pd.DataFrame(np.expand_dims(a,axis=0), columns=['A','B'])
# drop duplicates from df
ddf=df.drop_duplicates()
result = pd.concat([ddf,da]).shape[0] - pd.concat([ddf,da]).drop_duplicates().shape[0]
return result
print(check_if_np_array_is_in_df(df, a))
print(check_if_np_array_is_in_df(df, [1,3]))
Upvotes: 0
Reputation: 11460
If you also want to return the index where the matches occurred:
index_list = df[(df['A'] == 2)&(df['B'] == 3)].index.tolist()
Upvotes: 14
Reputation: 8357
Turns out it is really easy, the following does the job here:
>>> ((df['A'] == 2) & (df['B'] == 3)).any()
True
>>> ((df['A'] == 1) & (df['B'] == 2)).any()
False
Maybe somebody comes up with a better solution which allows directly passing in the array and the list of columns to match.
Note that the parenthesis around df['A'] == 2
are not optional since the &
operator binds just as strong as the ==
operator.
Upvotes: 107