pandas iterate thru column to find text match, once found compare the adjacent row values in both dataframes

Question

Trying to learn ways to iterate or cycle thru a column in pandas. In vba this is a for-loop and then selecting offset from selected cell position, just one option. However, I'm here to learn pandas and having a hard time understanding how to keep the rows straight when comparing the next column adjacency to the right or two columns over. Another way to maybe say this. Once ttype column text is found in other dataframe mtype column, I would like to compare the adjacent values in both dataframes with each other.

I have attached the dataframes for testing. I'm not sure if a for loop is the best way to achieve this but I have started one. I'm reading that pandas is more efficient handling the entire column at one time. Not sure if that can be done here. My first 3 lines of code (2 for-loops and if statement) are working. It cycles thru the text and finds the match. But I'm grappling with the handling of the adjacency values. I have reading up on the iloc and loc statements because I feel they grab the row. But I'm not sure of the syntax. I'm not even sure I can ask the right question to get me where I need so I can learn. So any reading material that you can help guide me to about this would be appreciated. pandas loc vs. iloc vs. ix vs. at vs. iat? get column value based on another column with list of strings in pandas dataframe

What is needed: With the toc dataframe I would like to cycle thru each value in ttype column, if value exist in the moc dataframe mtype column, then compare toc[ta column value] < moc[ma column value], if true, then continue, if false then toc[outfilter] == '1'.

import pandas as pd
from pandas import DataFrame, Series
import numpy as np

toc = {'ttype':['ta1k', 'brek', 'sjfgd',
           'gru2d','brek','crhe','ta1k','jump4'],
       'ta':[1, 2, 9, 9, 2, 2, 1, 1],
       'tc':[0, 1, 0, 0, 1, 0, 2, 0],
       'outfilter':[0, 0, 0, 0,0, 0, 0, 0]}

toc = pd.DataFrame(toc)

moc = {'mtype':[ 'sjfgd','ta1k','gru2d',
            'brek','crhe','jump4'],
       'mo':[2, 2, 4, 4, 3, 4],
       'ma':[2, 2, 4, 4, 2, 3],
       'mc':[1, 1, 3, 3, 1, 1]}

moc = pd.DataFrame(moc)

#-----
for tval in toc['ttype']:                          # Gets toc['ttype'].value
    for mval in moc['mtype']:                      # Gets toc['mtype'].value
        if t == m:                                 # compares if tval == mval
            if toc.loc['ta'] < moc.loc['ma']:      # compares toc.[ta] column value < moc.[ma]
                continue
            else:
                toc.loc['outfilter'] = '1'         # if the above is greater place '1' in outfilter 
                                                   #  column
        else:
            continue
#-----
 print(toc)
 print(moc)


 What I would like to do:  The '1's located in the outfilter column are a result of the toc-df[ta 
 column value] being greater than moc-df[ma column value]. 

 toc-df  ttype   ta   tc   outfilter
     0   ta1k    1    0    0
     1   brek    2    1    0
     2   sjfgd   9    0    1
     3   gru2d   9    0    1
     4   brek    2    1    0
     5   crhe    2    0    0
     6   ta1k    1    2    0
     7   jump4   1    0    0

I really appreciate your guys helping out and one day I hope that I can return the favor and pay it forward. Thank You for your time.!!! Please let me know if if you have any questions.

David Erickson · Accepted Answer

I would merge the dataframes on the ttype and mtype columns similar to do an index match/vlookup in Excel, but you don't want to merge the entire moc dataframe, so just specify and merge in the columns you need (mtype and ma).
From there, simply do an np.where to see if ta values are greater than ma values and return 1 or 0 similar to an Excel if formula.
Finally, drop the columns you don't need

input:

import pandas as pd, numpy as np
toc = {'ttype':['ta1k', 'brek', 'sjfgd',
       'gru2d','brek','crhe','ta1k','jump4'],
   'ta':[1, 2, 9, 9, 2, 2, 1, 1],
   'tc':[0, 1, 0, 0, 1, 0, 2, 0],
   'outfilter':[0, 0, 0, 0,0, 0, 0, 0]}

toc = pd.DataFrame(toc)

moc = {'mtype':[ 'sjfgd','ta1k','gru2d',
        'brek','crhe','jump4'],
   'mo':[2, 2, 4, 4, 3, 4],
   'ma':[2, 2, 4, 4, 2, 3],
   'mc':[1, 1, 3, 3, 1, 1]}

moc = pd.DataFrame(moc)

code:

toc = pd.merge(toc,moc[['mtype','ma']],how='left',left_on='ttype',right_on='mtype')
toc['outfilter'] = np.where((toc['ta'] > toc['ma']),1,0)
toc = toc.drop(['mtype','ma'], axis=1)
toc

breakdown of code line-by-line:

step 1 (similar to an excel index-match formula):

pd.merge(toc,moc[['mtype','ma']],how='left',left_on='ttype',right_on='mtype')

   ttype  ta  tc  outfilter  mtype  ma
0   ta1k   1   0          0   ta1k   2
1   brek   2   1          0   brek   4
2  sjfgd   9   0          0  sjfgd   2
3  gru2d   9   0          0  gru2d   4
4   brek   2   1          0   brek   4
5   crhe   2   0          0   crhe   2
6   ta1k   1   2          0   ta1k   2
7  jump4   1   0          0  jump4   3

step 2 (similar to an excel IF formula):

toc['outfilter'] = np.where((toc['ta'] > toc['ma']),1,0)

    ttype  ta  tc  outfilter  mtype  ma
0   ta1k   1   0          0   ta1k   2
1   brek   2   1          0   brek   4
2  sjfgd   9   0          1  sjfgd   2
3  gru2d   9   0          1  gru2d   4
4   brek   2   1          0   brek   4
5   crhe   2   0          0   crhe   2
6   ta1k   1   2          0   ta1k   2
7  jump4   1   0          0  jump4   3

step 3 - final output (just dropping unneeded columns):

toc = toc.drop(['mtype','ma'], axis=1)

   ttype  ta  tc  outfilter
0   ta1k   1   0          0
1   brek   2   1          0
2  sjfgd   9   0          1
3  gru2d   9   0          1
4   brek   2   1          0
5   crhe   2   0          0
6   ta1k   1   2          0
7  jump4   1   0          0

If I thought a little bit more, there is probably an even simpler way to do this in python with just one line of cod using pandas methods, but this way is simple enough and easy to understand.

Also, VBA was the language I switched to from Pandas about 18 months ago as well. I would say that 99% of problems can be solved with pandas methods, list comprehension or .apply(lambda x:.... Pandas methods or numpy methods are always the way to go in terms of simplicity, speed, performance, etc. It is very popular to loop in VBA, but you should try to get away from that as quickly as you can and learn the various pandas methods.

pandas iterate thru column to find text match, once found compare the adjacent row values <> in both dataframes

Answers (1)

Related Questions

pandas iterate thru column to find text match, once found compare the adjacent row values &lt;&gt; in both dataframes

Answers (1)

Related Questions

pandas iterate thru column to find text match, once found compare the adjacent row values <> in both dataframes