Tim
Tim

Reputation: 113

Unable to write lookup output to Pandas dataframe using append loop

I want to lookup column in dataframe (Lookup item) using pandas (or anything else). If the value is found in that column then corresponding value in another column (Corresponding item) of same dataframe is picked up with both recorded into another dataframe.

Example cols are:

  1. lookup_id = [111, 222, 333, 444, 777 , 1089 , 3562 ]
  2. id_number = [111, 23, 333, 444, 10101 ,777 , 222 ]
items = arr.array('i', [111, 222, 333, 444, 777 , 1089 , 3562 ])
new_df = []

for item in items:
    lookup_id = item
    
    id_number = get_score_by_id(df_past, lookup_id)
    if id_number is not None:
       df_append = (f'{lookup_id}, {id_number}')

    else:
       df_append = ('NAN')

print(df_append)

The required output dataframe is shown below:

Lookup item Corresponding item
111 OK
222 OK
333 OK
444 OK
777 OK
1089 NAN
3562 NAN

So here all are found so 'OK' returned and 1089 and 3562 do not exist so NAN recored.

The script has been developed to replace an xls VLOOKUP so it's a vlookup/append but cant get the append to add new fields in next row fresh df.

I can get it to work just printing the output but want the new_df populated and that’s primarily intent of question.

Thanks.

Upvotes: 0

Views: 59

Answers (2)

jalopezp
jalopezp

Reputation: 839

A lot of functionality that has to do with comparing values in Pandas can be achieved through merge. In this case:

lookup_id = pd.DataFrame([111, 222, 333, 444, 777, 1089, 3562], columns=['id'])
id_number = pd.DataFrame([111, 23, 333, 444, 10101, 777, 222], columns=['id'])

lookup_id.merge(id_number, how='left', indicator=True)

returns

    id      _merge
0   111     both
1   222     both
2   333     both
3   444     both
4   777     both
5   1089    left_only
6   3562    left_only

The indicator parameter also accepts a string that it will use as the column title, and you can further process the output:

(lookup_id
    .merge(id_number, how='left', indicator='corr')
    .assign(corr=lambda df: df['corr'].map({'both':'OK', 'left_only':'NAN'}))
)

which then returns:

    id      corr
0   111     OK
1   222     OK
2   333     OK
3   444     OK
4   777     OK
5   1089    NAN
6   3562    NAN

Upvotes: 0

furas
furas

Reputation: 142985

If you need only OK, NAN then you could use .isin() to get True/False
and later you can convert it to strings.

import pandas as pd
import numpy as np

df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})

df['result'] = df['lookup_id'].isin(df_past['id_number'])
df['result'] = np.where(df['result'], 'OK', 'NAN')

print(df)

Result:

   lookup_id result
0        111     OK
1        222     OK
2        333     OK
3        444     OK
4        777     OK
5       1089    NAN
6       3562    NAN

If you need more information from other dataframe then you can use .apply(function) and inside function get more data and format it

import pandas as pd
import numpy as np

df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})

df['result'] = df['lookup_id'].isin(df_past['id_number'])

def function(lookup_id):
    #print('lookup_id:', lookup_id)

    row = df_past[ df_past['id_number'] == lookup_id ]
    #print(row)

    if not row.empty:
        return f'{lookup_id}, {row.index[0]}'
       
    return 'NAN'
    
df['result'] = df['lookup_id'].apply(function)

print(df)

Result:

   lookup_id  result
0        111  111, 0
1        222  222, 6
2        333  333, 2
3        444  444, 3
4        777  777, 5
5       1089     NAN
6       3562     NAN

Upvotes: 0

Related Questions