Reputation: 8554
Here is my datafarme 'df':
match name group
adamant Adamant Home Network 86
adamant ADAMANT, Ltd. 86
adamant bild TOV Adamant-Bild 86
360works 360WORKS 94
360works 360works.com 94
Per group number I want to compare the names one by one and see if they are matched to a same word from the 'match' column.
So desired output will be counts:
If they match we count it as 'TP' and if not we count it as 'FN'.
I had an idea of counting number of match words per group number but that would not help completely with what I want:
df.groupby(group).count()
Does any body have an idea how to do it?
Upvotes: 0
Views: 3517
Reputation: 1819
If I understood well your question, this should do the work:
import re
import pandas
df = pandas.DataFrame([['adamant', 'Adamant Home Network', 86], ['adamant', 'ADAMANT, Ltd.', 86],
['adamant bild', "TOV Adamant-Bild", 86], ['360works', '360WORKS', 94],
['360works ', "360works.com ", 94]], columns=['match', 'name', 'group'])
def my_function(group):
for i, row in group.iterrows():
if ''.join(re.findall("[a-zA-Z]+", row['match'])).lower() not in ''.join(
re.findall("[a-zA-Z]+", row['name'])).lower():
# parsing the names in each columns and looking for an inclusion
# if one of the inclusion fails, we return 'FN'
return 'FN'
# if all inclusions succeed, we return 'TP'
return 'TP'
res_series = df.groupby('group').apply(my_function)
res_series.name = 'count'
res_df = res_series.reset_index()
print res_df
This will give you this DataFrame:
group count
1 86 'TP'
2 94 'TP'
Upvotes: 1
Reputation: 16134
This function will compare name and match columns by row, for each supplied group:
def apply_func(df):
x = df['name'] == df['match']
return x.map({False:'FIN', True:'TP'})
In [683]: temp.join(temp.groupby('group').apply(apply_func).reset_index(), rsuffix='_1', how='left')
Out[683]:
match name group group_1 level_1 0
0 adamant Adamant Home Network 86 86 0 FIN
1 adamant ADAMANT, Ltd. 86 86 1 FIN
2 adamant bild TOV Adamant-Bild 86 86 2 FIN
3 360works 360WORKS 94 94 3 FIN
4 360works 360works.com 94 94 4 FIN
Upvotes: 1