Reputation: 117
Suppose I have coffee shop menu list. I want to take text and return quantity and item name.
menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']
Now I want to extract number and the ordered item name matching from my menu(Any 1st match from menu)
Example Text : Bring 1 Capputino
Output dataframe :
text Quantity match
Bring 1 Capputino 1 Cappuccino
Not necessary text entered spelling will be exact same as menu so it will just return the matching pattern from menu list in match column.
I have written below code but its returning Nan in match column. Appreciate any guidance.
import pandas as pd
import numpy as np
import re
def ccd():
global df
menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']
for i in range(len(menu)):
menu[i] = menu[i].upper()
order = input('Enter a substring: ').upper()
args_dict = {'CAPUCINO':'CAPPUCCINO',
"COFFI":"COFFEE",
"COOKI":"COOKIE" }
#order=order.split()
for i,j in enumerate(order):
if j in args_dict:
order[i]=args_dict[j]
df = pd.DataFrame({'text':[order]})
df["Quantity"] = df.text.str.extract('(\d+)')
df['match'] = df.text.str.extract('(' + '|'.join(menu) + ')')
Upvotes: 1
Views: 76
Reputation: 11351
Take a look at the following:
import re
menu_map = {'cap': 'Cappucino',
'caf': 'Café Latte',
"cof": "Irish coffee",
"cok": "Cookie",
"cook": "Cookie"}
order = input('Enter a substring: ')
df = pd.DataFrame({'Text': [order]})
df["Quantity"] = df.Text.str.extract('(\d+)')
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df['Replacement'] = df.Match.str.casefold().map(menu_map)
Result for order == 'Bring 1 Caputino'
Text Quantity Match Replacement
0 Bring 1 Caputino 1 Cap Cappucino
and order == 'Bring 1 Caxutino'
Text Quantity Match Replacement
0 Bring 1 Caxutino 1 NaN NaN
because there's no pattern in menu_map
that catches 'Caxutino'
.
It looks to me that this is what you're essentially looking for? Since you don't want the Replacement
column (I have only used it for transparency) you would do:
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')', flags=re.IGNORECASE)
df.Match = df.Match.str.casefold().map(menu_map)
(I don't understand what you are trying to achieve with the for ... if ...
part.)
EDIT: Now that I understand the for ... if ...
part, I'd propose the following approach:
args_dict = {'capu': 'Cappuccino', 'chap': 'Cappuccino',
'cof': 'Coffee', 'coof': 'Coffee', 'chof': 'Coffee',
'cok': 'Cookie', 'chok': 'Cookie', 'choo': 'Cookie'}
order = order.split()
for i, word in enumerate(order):
word = word.casefold()
for key in args_dict:
if word.startswith(key):
order[i] = args_dict[key]
break
order = ' '.join(order)
Or:
args_dict = {('capu', 'chap'): 'Cappuccino',
('cof', 'coof', 'chof'): 'Coffee',
('cok', 'chok', 'choo'): 'Cookie'}
order = order.split()
for i, word in enumerate(order):
word = word.casefold()
for keys, replacement in args_dict.items():
for key in keys:
if word.startswith(key):
order[i] = replacement
break
order = ' '.join(order)
Upvotes: 1