Append column in a dataframe with string values

Question

I have a case where I am trying to append the calculated percent value in a understandable format into a column in my dataframe called df. When I say understandable format the output into the column should go like '40% Matched', like in the case below.

df = pd.DataFrame({ 'Col1':[['Phone', 'Watch', 'Pen', 'Pencil', 'Knife'],['apple','orange','mango','cherry','banana','kiwi','tomato','avocado']], 'Col2': [['Phone', 'Watch', 'Pen', 'Pencil', 'fork'],['orange','avocado','kiwi','mango','grape','lemon','tomato']]})

df['Matched Percent'] = 'No Match'

for index,(lst1,lst2) in enumerate(zip(df['Col1'],df['Col2'])):
   if(lst1 == lst2):
      print('100% Matched')
   else:
      c1 = Counter(lst1)
      c2 = Counter(lst2)
      matching = {k: c1[k]+c2[k] for k in c1.keys() if k in c2}
      text = '% Matched'
      if len(lst1) > len(lst2):
         out = round(len(matching)/len(lst1)*100)
         #df['Matched Percent'].append(out,'% Matched')
         print(out,'% Matched')
      else:
         out = round(len(matching)/len(lst2)*100)
         #df['Matched Percent'].append(out,'% Matched')
         print(out,'% Matched')

80 % Matched
62 % Matched

TypeError: cannot concatenate object of type ""; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

I keep getting TypeError. I tried a couple of ways but no luck. I am able to print the values in the way I want in my screen as shown above. But when I append it to my dataframe df it fails. Appreciate advice on how to solve this.

jpp · Accepted Answer

Your logic seems verbose. You can use a list comprehension:

zipper = zip(map(set, df['Col1']), map(set, df['Col2']))
df['Matched Percent'] = [len(c1 & c2) / max(len(c1), len(c2)) for c1, c2 in zipper]

print(df)

                                                Col1  \
0                 [Phone, Watch, Pen, Pencil, Knife]   
1  [apple, orange, mango, cherry, banana, kiwi, t...   

                                                Col2  Matched Percent  
0                  [Phone, Watch, Pen, Pencil, fork]            0.800  
1  [orange, avocado, kiwi, mango, grape, lemon, t...            0.625

Note there isn't much scope for optimizing such calculations with Pandas, which isn't designed to hold lists in series. If you need "pretty" output, you can use f-strings, supported in Python 3.6+:

print((df['Matched Percent']*100).map(lambda x: f'{x:.0f}% Matched'))

0    80% Matched
1    62% Matched
Name: Matched Percent, dtype: object

Append column in a dataframe with string values

Answers (2)

Related Questions