Reputation: 421
I have a case where I am trying to append the calculated percent value in a understandable format into a column in my dataframe called df. When I say understandable format the output into the column should go like '40% Matched'
, like in the case below.
df = pd.DataFrame({ 'Col1':[['Phone', 'Watch', 'Pen', 'Pencil', 'Knife'],['apple','orange','mango','cherry','banana','kiwi','tomato','avocado']], 'Col2': [['Phone', 'Watch', 'Pen', 'Pencil', 'fork'],['orange','avocado','kiwi','mango','grape','lemon','tomato']]})
df['Matched Percent'] = 'No Match'
for index,(lst1,lst2) in enumerate(zip(df['Col1'],df['Col2'])):
if(lst1 == lst2):
print('100% Matched')
else:
c1 = Counter(lst1)
c2 = Counter(lst2)
matching = {k: c1[k]+c2[k] for k in c1.keys() if k in c2}
text = '% Matched'
if len(lst1) > len(lst2):
out = round(len(matching)/len(lst1)*100)
#df['Matched Percent'].append(out,'% Matched')
print(out,'% Matched')
else:
out = round(len(matching)/len(lst2)*100)
#df['Matched Percent'].append(out,'% Matched')
print(out,'% Matched')
80 % Matched
62 % Matched
TypeError: cannot concatenate object of type "<class 'int'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
I keep getting TypeError. I tried a couple of ways but no luck. I am able to print the values in the way I want in my screen as shown above. But when I append it to my dataframe df
it fails. Appreciate advice on how to solve this.
Upvotes: 1
Views: 4431
Reputation: 21
If I understand your question correctly,
df = pd.DataFrame({ 'Col1':[['Phone', 'Watch', 'Pen', 'Pencil', 'Knife'], ['apple','orange','mango','cherry','banana','kiwi','tomato','avocado']], 'Col2': [['Phone', 'Watch', 'Pen', 'Pencil', 'fork'], ['orange','avocado','kiwi','mango','grape','lemon','tomato']]}) df['Matched Percent'] = 'No Match' for index,(lst1,lst2) in enumerate(zip(df['Col1'],df['Col2'])): if(lst1 == lst2): print('100% Matched') df['Matched Percent'][index] = '{}% Matched'.format(100) else: c1 = Counter(lst1) c2 = Counter(lst2) matching = {k: c1[k]+c2[k] for k in c1.keys() if k in c2} text = '% Matched' if len(lst1) > len(lst2): out = round(len(matching)/len(lst1)*100) #df['Matched Percent'].append(out,'% Matched') print(out,'% Matched') df['Matched Percent'][index] = '{}% Matched'.format(out) else: out = round(len(matching)/len(lst2)*100) #df['Matched Percent'].append(out,'% Matched') print(out,'% Matched') df['Matched Percent'][index] = '{}% Matched'.format(out)
I expect that you have a df with the following condition:
import pandas as pd
result = 'blablabla'
df = pd.DataFrame(data=[[1,2,3],[4,5,6]],
columns=['a','b','Match Percent'])
s = pd.Series(index=df.columns)
df = df.append(s, ignore_index=True)
df['Match Percent'][df.shape[0]-1] = result
print(df)
a b Match Percent
0 1.0 2.0 3
1 4.0 5.0 6
2 NaN NaN blablabla
I think this is what you want, right??
Correction:
df = pd.DataFrame(data=[[1,2],[4,5]],
columns=['a','b'])
df['Match Percent'] = (df['a']/df['b']*100).apply(lambda x: '{} Match'.format(x))
a b Match Percent
0 1 2 50.0 Match
1 4 5 80.0 Match
Upvotes: 0
Reputation: 164693
Your logic seems verbose. You can use a list comprehension:
zipper = zip(map(set, df['Col1']), map(set, df['Col2']))
df['Matched Percent'] = [len(c1 & c2) / max(len(c1), len(c2)) for c1, c2 in zipper]
print(df)
Col1 \
0 [Phone, Watch, Pen, Pencil, Knife]
1 [apple, orange, mango, cherry, banana, kiwi, t...
Col2 Matched Percent
0 [Phone, Watch, Pen, Pencil, fork] 0.800
1 [orange, avocado, kiwi, mango, grape, lemon, t... 0.625
Note there isn't much scope for optimizing such calculations with Pandas, which isn't designed to hold lists in series. If you need "pretty" output, you can use f-strings, supported in Python 3.6+:
print((df['Matched Percent']*100).map(lambda x: f'{x:.0f}% Matched'))
0 80% Matched
1 62% Matched
Name: Matched Percent, dtype: object
Upvotes: 1