Reputation: 879
I have a data frame such as
query subject col1
A dog ok
B cat okl
C cat oklp
D frog ok
E cat ok
F fox ok
and a file.txt such as:
dog
cat
and the idea is to only keep row that have a pattern present in the file.txt. Here I should get :
query subject col1
A dog ok
B cat okl
C cat oklp
E cat ok
I tried with :
file = open('file.txt').read()
df=[]
for row in tab['subject']:
if row in file:
row.append(df)
but it does not seem to be the solution, thank you for your help.
Upvotes: 2
Views: 158
Reputation: 2019
Considering that your data frame is called df, this answer reads the file.txt also as a dataframe and merge both data frames, resulting in the desired result - solution similar to an inner join from SQL:
>> df2 = pd.read_csv('file.txt', header= None, names=['subject'])
>> pd.merge(df, df2, on='subject')
query subject col1
0 A dog ok
1 B cat okl
2 C cat oklp
3 E cat ok
Upvotes: 2
Reputation: 12928
You can use df.loc[]
to grab specific rows where a criterion is met, and isin()
to check whether something in your dataframe shows up in a given list. First make a list of the entries in your file:
vals = [] # placeholder for our file contents
with open('file.txt', 'r') as a_file:
for line in a_file:
vals.append(line)
Next select rows where the subject column is one of the values read from your file. Assuming your dataframe is called df
:
new_df = df.loc[df.subject.isin(vals)]
Side note: you almost never want to iterate over a dataframe. There are exceptions to this, but generally speaking if you find yourself iterating on a dataframe (e.g. for row in df:
), you can rewrite that piece of code to take advantage of vectorized functions in Pandas which are much more efficient.
Upvotes: 3