Reputation: 879
I have two data frame such as :
>>> df1
query target
A:1 AZ
B:4 AZ
C:5 AZ
D:1 AZ
>>> df2
query target
B:6 AZ
C:5 AZ
D:1 AZ
A:1 AZ
And the idea is simply to check if the values present in the df1['query']
are present in the df2['query']
as well, whatever the order of the row and add a new column the df1 and get:
>>> df1
query target new_col
A:1 AZ present
B:4 AZ Not_present
C:5 AZ present
D:1 AZ present
I tried : df1["new_col"] = df2.apply(lambda row: "present" if row[0] == df1["query"][row.name] else "Not_present", axis = 1)
but it only check match by rows.
Thanks for your help.
EDIT
What if know I have to compare 3 data frames with the df1
Here is the new exemple:
df1
query
A1
A2
B3
B5
B6
B7
C8
C9
df2
query target
C9 type2
Z6 type2
df3
query target
C10 type3
B6 type3
df4
query target
A1 type4
K9 type1
and I will do a loop such as :
for df in dataframes:
df1['new_col'] = np.where(blast['query'].isin(df['query']), 'Present', 'Not_absent')
the issue is that it will override each time the column df1['New_col']
At the end I should get :
df1
query new_col
A1 present_type4
A2. not_present
B3. not_present
B5. not_present
B6. present_type3
B7. not_present
C8. not_present
C9. present_type2
Edit for jezrael
:
In order to open my data frames, I have a file.txt
file such as:
Species1
Species2
Species3
It helps to call the wright path link where a data frame is for exemple:
/admin/user/project/Species1/dataframe.txt etc
so I juste call them to create the df such as :
for i in file.txt:
df = open("/admin/user/project/"+i+"/dataframe.txt","r")
Then I have as I said above to find matches between all these data frame and one big data frame (df1)
.
By doing:
values=[]
for names in file.txt:
values.append("/admin/user/project/"+i+"/dataframe.txt")
for names file.txt:
keys.append(names)
dicts = {}
for i in keys:
dicts[i] = values[i]
d = {}
for i in range(len(keys)):
d[i]=None
for i in range(len(keys)):
d[keys[i]] = d.pop(i)
for (k,v), i in zip( d.items(),values):
d[k] = i
I succeed to get something as you shown me:
but the values are the path to open for the data frames:
>>> d
{'Species1': '/admin/user/project/Species1/dataframe.txt', 'Species2': '/admin/user/project/Species2/dataframe.txt', 'Species3': '/admin/user/project/Species3/dataframe.txt'}
Upvotes: 1
Views: 48
Reputation: 863741
Use numpy.where
with Series.isin
:
df1['new_col'] = np.where(df1['query'].isin(df2['query']), 'present', 'Not_present')
print (df1)
query target new_col
0 A:1 AZ present
1 B:4 AZ Not_present
2 C:5 AZ present
3 D:1 AZ present
EDIT:
d = {'type2':df2, 'type3':df3, 'type4':df4}
df1['new_col'] = 'not_present'
for k, v in d.items():
df1.loc[df1['query'].isin(v['query']), 'new_col'] = 'Present_{}'.format(k)
print (df1)
query new_col
0 A1 Present_type4
1 A2 not_present
2 B3 not_present
3 B5 not_present
4 B6 Present_type3
5 B7 not_present
6 C8 not_present
7 C9 Present_type2
EDIT: You can in loop create DataFrame and pass to isin
:
d = {'Species1': '/admin/user/project/Species1/dataframe.txt', 'Species2': '/admin/user/project/Species2/dataframe.txt', 'Species3': '/admin/user/project/Species3/dataframe.txt'}
df1['new_col'] = 'not_present'
for k, v in d.items():
df = pd.read_csv(v)
df1.loc[df1['query'].isin(df['query']), 'new_col'] = 'Present_{}'.format(k)
Upvotes: 2
Reputation: 5290
One other solution using pd.merge
df_temp = df_2.copy()
df_temp['new_col'] = 'present'
df_temp = df_temp['query', new_col]
df1 = df1.merge(df_temp, how='left', on='query').fillna('Not_present')
Upvotes: 0