Reputation: 113
I want to lookup column in dataframe (Lookup item) using pandas (or anything else). If the value is found in that column then corresponding value in another column (Corresponding item) of same dataframe is picked up with both recorded into another dataframe.
Example cols are:
items = arr.array('i', [111, 222, 333, 444, 777 , 1089 , 3562 ])
new_df = []
for item in items:
lookup_id = item
id_number = get_score_by_id(df_past, lookup_id)
if id_number is not None:
df_append = (f'{lookup_id}, {id_number}')
else:
df_append = ('NAN')
print(df_append)
The required output dataframe is shown below:
Lookup item | Corresponding item |
---|---|
111 | OK |
222 | OK |
333 | OK |
444 | OK |
777 | OK |
1089 | NAN |
3562 | NAN |
So here all are found so 'OK' returned and 1089 and 3562 do not exist so NAN recored.
The script has been developed to replace an xls VLOOKUP so it's a vlookup/append but cant get the append to add new fields in next row fresh df.
I can get it to work just printing the output but want the new_df populated and that’s primarily intent of question.
Thanks.
Upvotes: 0
Views: 59
Reputation: 839
A lot of functionality that has to do with comparing values in Pandas can be achieved through merge
. In this case:
lookup_id = pd.DataFrame([111, 222, 333, 444, 777, 1089, 3562], columns=['id'])
id_number = pd.DataFrame([111, 23, 333, 444, 10101, 777, 222], columns=['id'])
lookup_id.merge(id_number, how='left', indicator=True)
returns
id _merge
0 111 both
1 222 both
2 333 both
3 444 both
4 777 both
5 1089 left_only
6 3562 left_only
The indicator
parameter also accepts a string that it will use as the column title, and you can further process the output:
(lookup_id
.merge(id_number, how='left', indicator='corr')
.assign(corr=lambda df: df['corr'].map({'both':'OK', 'left_only':'NAN'}))
)
which then returns:
id corr
0 111 OK
1 222 OK
2 333 OK
3 444 OK
4 777 OK
5 1089 NAN
6 3562 NAN
Upvotes: 0
Reputation: 142985
If you need only OK
, NAN
then you could use .isin()
to get True
/False
and later you can convert it to strings.
import pandas as pd
import numpy as np
df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})
df['result'] = df['lookup_id'].isin(df_past['id_number'])
df['result'] = np.where(df['result'], 'OK', 'NAN')
print(df)
Result:
lookup_id result
0 111 OK
1 222 OK
2 333 OK
3 444 OK
4 777 OK
5 1089 NAN
6 3562 NAN
If you need more information from other dataframe then you can use .apply(function)
and inside function get more data and format it
import pandas as pd
import numpy as np
df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})
df['result'] = df['lookup_id'].isin(df_past['id_number'])
def function(lookup_id):
#print('lookup_id:', lookup_id)
row = df_past[ df_past['id_number'] == lookup_id ]
#print(row)
if not row.empty:
return f'{lookup_id}, {row.index[0]}'
return 'NAN'
df['result'] = df['lookup_id'].apply(function)
print(df)
Result:
lookup_id result
0 111 111, 0
1 222 222, 6
2 333 333, 2
3 444 444, 3
4 777 777, 5
5 1089 NAN
6 3562 NAN
Upvotes: 0