Abhishek Prasar
Abhishek Prasar

Reputation: 117

Compare column text pattern with defined list and return 1st match string from defined list to a new column in dataframe

Suppose I have coffee shop menu list. I want to take text and return quantity and item name.

menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']

Now I want to extract number and the ordered item name matching from my menu(Any 1st match from menu)

Example Text : Bring 1 Capputino

Output dataframe :

      text                          Quantity                   match

     Bring 1 Capputino                 1                     Cappuccino  

Not necessary text entered spelling will be exact same as menu so it will just return the matching pattern from menu list in match column.

I have written below code but its returning Nan in match column. Appreciate any guidance.

Code:

    import pandas as pd
    import numpy as np
    import re

    def ccd():
    global df

menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']

for i in range(len(menu)):
    menu[i] = menu[i].upper()


order = input('Enter a substring: ').upper()



args_dict = {'CAPUCINO':'CAPPUCCINO',
             "COFFI":"COFFEE",
             "COOKI":"COOKIE" } 
#order=order.split()

for i,j in enumerate(order):
    if j in args_dict:
        order[i]=args_dict[j]
df = pd.DataFrame({'text':[order]})
df["Quantity"] = df.text.str.extract('(\d+)')
df['match'] = df.text.str.extract('(' + '|'.join(menu) + ')')

Upvotes: 1

Views: 76

Answers (1)

Timus
Timus

Reputation: 11351

Take a look at the following:

import re

menu_map = {'cap': 'Cappucino',
            'caf': 'Café Latte',
            "cof": "Irish coffee",
            "cok": "Cookie",
            "cook": "Cookie"} 

order = input('Enter a substring: ')

df = pd.DataFrame({'Text': [order]})
df["Quantity"] = df.Text.str.extract('(\d+)')
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df['Replacement'] = df.Match.str.casefold().map(menu_map)

Result for order == 'Bring 1 Caputino'

               Text Quantity Match Replacement
0  Bring 1 Caputino        1   Cap   Cappucino

and order == 'Bring 1 Caxutino'

               Text Quantity Match Replacement
0  Bring 1 Caxutino        1   NaN         NaN

because there's no pattern in menu_map that catches 'Caxutino'.

It looks to me that this is what you're essentially looking for? Since you don't want the Replacement column (I have only used it for transparency) you would do:

df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df.Match = df.Match.str.casefold().map(menu_map)

(I don't understand what you are trying to achieve with the for ... if ... part.)

EDIT: Now that I understand the for ... if ... part, I'd propose the following approach:

args_dict = {'capu': 'Cappuccino', 'chap': 'Cappuccino', 
             'cof': 'Coffee', 'coof': 'Coffee', 'chof': 'Coffee',
             'cok': 'Cookie', 'chok': 'Cookie', 'choo': 'Cookie'}

order = order.split()
for i, word in enumerate(order):
    word = word.casefold()
    for key in args_dict:
        if word.startswith(key):
            order[i] = args_dict[key]
            break
order = ' '.join(order)

Or:

args_dict = {('capu', 'chap'): 'Cappuccino', 
             ('cof', 'coof', 'chof'): 'Coffee',
             ('cok', 'chok', 'choo'): 'Cookie'}

order = order.split()
for i, word in enumerate(order):
    word = word.casefold()
    for keys, replacement in args_dict.items():
        for key in keys:
            if word.startswith(key):
                order[i] = replacement
                break
order = ' '.join(order)

Upvotes: 1

Related Questions