Reputation: 239
Trying to learn ways to iterate or cycle thru a column in pandas. In vba this is a for-loop and then selecting offset from selected cell position, just one option. However, I'm here to learn pandas and having a hard time understanding how to keep the rows straight when comparing the next column adjacency to the right or two columns over. Another way to maybe say this. Once ttype column text is found in other dataframe mtype column, I would like to compare the adjacent values in both dataframes with each other.
I have attached the dataframes for testing. I'm not sure if a for loop is the best way to achieve this but I have started one. I'm reading that pandas is more efficient handling the entire column at one time. Not sure if that can be done here. My first 3 lines of code (2 for-loops and if statement) are working. It cycles thru the text and finds the match. But I'm grappling with the handling of the adjacency values. I have reading up on the iloc and loc statements because I feel they grab the row. But I'm not sure of the syntax. I'm not even sure I can ask the right question to get me where I need so I can learn. So any reading material that you can help guide me to about this would be appreciated. pandas loc vs. iloc vs. ix vs. at vs. iat? get column value based on another column with list of strings in pandas dataframe
What is needed: With the toc dataframe I would like to cycle thru each value in ttype column, if value exist in the moc dataframe mtype column, then compare toc[ta column value] < moc[ma column value], if true, then continue, if false then toc[outfilter] == '1'.
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
toc = {'ttype':['ta1k', 'brek', 'sjfgd',
'gru2d','brek','crhe','ta1k','jump4'],
'ta':[1, 2, 9, 9, 2, 2, 1, 1],
'tc':[0, 1, 0, 0, 1, 0, 2, 0],
'outfilter':[0, 0, 0, 0,0, 0, 0, 0]}
toc = pd.DataFrame(toc)
moc = {'mtype':[ 'sjfgd','ta1k','gru2d',
'brek','crhe','jump4'],
'mo':[2, 2, 4, 4, 3, 4],
'ma':[2, 2, 4, 4, 2, 3],
'mc':[1, 1, 3, 3, 1, 1]}
moc = pd.DataFrame(moc)
#-----
for tval in toc['ttype']: # Gets toc['ttype'].value
for mval in moc['mtype']: # Gets toc['mtype'].value
if t == m: # compares if tval == mval
if toc.loc['ta'] < moc.loc['ma']: # compares toc.[ta] column value < moc.[ma]
continue
else:
toc.loc['outfilter'] = '1' # if the above is greater place '1' in outfilter
# column
else:
continue
#-----
print(toc)
print(moc)
What I would like to do: The '1's located in the outfilter column are a result of the toc-df[ta
column value] being greater than moc-df[ma column value].
toc-df ttype ta tc outfilter
0 ta1k 1 0 0
1 brek 2 1 0
2 sjfgd 9 0 1
3 gru2d 9 0 1
4 brek 2 1 0
5 crhe 2 0 0
6 ta1k 1 2 0
7 jump4 1 0 0
I really appreciate your guys helping out and one day I hope that I can return the favor and pay it forward. Thank You for your time.!!! Please let me know if if you have any questions.
Upvotes: 1
Views: 891
Reputation: 16683
ttype
and mtype
columns similar to do an index match/vlookup in Excel, but you don't want to merge the entire moc
dataframe, so just specify and merge in the columns you need (mtype
and ma
).np.where
to see if ta
values are greater than ma
values and return 1
or 0
similar to an Excel if
formula.input:
import pandas as pd, numpy as np
toc = {'ttype':['ta1k', 'brek', 'sjfgd',
'gru2d','brek','crhe','ta1k','jump4'],
'ta':[1, 2, 9, 9, 2, 2, 1, 1],
'tc':[0, 1, 0, 0, 1, 0, 2, 0],
'outfilter':[0, 0, 0, 0,0, 0, 0, 0]}
toc = pd.DataFrame(toc)
moc = {'mtype':[ 'sjfgd','ta1k','gru2d',
'brek','crhe','jump4'],
'mo':[2, 2, 4, 4, 3, 4],
'ma':[2, 2, 4, 4, 2, 3],
'mc':[1, 1, 3, 3, 1, 1]}
moc = pd.DataFrame(moc)
code:
toc = pd.merge(toc,moc[['mtype','ma']],how='left',left_on='ttype',right_on='mtype')
toc['outfilter'] = np.where((toc['ta'] > toc['ma']),1,0)
toc = toc.drop(['mtype','ma'], axis=1)
toc
breakdown of code line-by-line:
step 1 (similar to an excel index-match
formula):
pd.merge(toc,moc[['mtype','ma']],how='left',left_on='ttype',right_on='mtype')
ttype ta tc outfilter mtype ma
0 ta1k 1 0 0 ta1k 2
1 brek 2 1 0 brek 4
2 sjfgd 9 0 0 sjfgd 2
3 gru2d 9 0 0 gru2d 4
4 brek 2 1 0 brek 4
5 crhe 2 0 0 crhe 2
6 ta1k 1 2 0 ta1k 2
7 jump4 1 0 0 jump4 3
step 2 (similar to an excel IF
formula):
toc['outfilter'] = np.where((toc['ta'] > toc['ma']),1,0)
ttype ta tc outfilter mtype ma
0 ta1k 1 0 0 ta1k 2
1 brek 2 1 0 brek 4
2 sjfgd 9 0 1 sjfgd 2
3 gru2d 9 0 1 gru2d 4
4 brek 2 1 0 brek 4
5 crhe 2 0 0 crhe 2
6 ta1k 1 2 0 ta1k 2
7 jump4 1 0 0 jump4 3
step 3 - final output (just dropping unneeded columns):
toc = toc.drop(['mtype','ma'], axis=1)
ttype ta tc outfilter
0 ta1k 1 0 0
1 brek 2 1 0
2 sjfgd 9 0 1
3 gru2d 9 0 1
4 brek 2 1 0
5 crhe 2 0 0
6 ta1k 1 2 0
7 jump4 1 0 0
If I thought a little bit more, there is probably an even simpler way to do this in python with just one line of cod using pandas methods, but this way is simple enough and easy to understand.
Also, VBA was the language I switched to from Pandas about 18 months ago as well. I would say that 99% of problems can be solved with pandas methods, list comprehension or .apply(lambda x:...
. Pandas methods or numpy methods are always the way to go in terms of simplicity, speed, performance, etc. It is very popular to loop in VBA, but you should try to get away from that as quickly as you can and learn the various pandas methods.
Upvotes: 1