Reputation: 39
I have a list of transactions, including things like
*"AMZN mktp US*MH434G300",
*"HEALTH CARE WEB PMT",*
*"ARBYS #4323"**
etc, and I want to write a program that will look for keywords in these descriptions, and assign a category based on these keywords. I haven't found anything like this in my internet searches surprisingly, and I suppose its possible its because its difficult to do.
What I have done so far is something like this:
def getCategory(description):
cat = ''
if 'AMZN' in description:
cat = 'shopping'
elif 'ARBYS' in description:
cat = 'restaurant'
return cat
While this does work, its extremely painstaking, and I have to write a separate if statement for each and every keyword. There has to be a better way to do this. Is there a library for something like this? Even just a way I could add a bunch of keywords to a list, and then use the list in the if statement would be amazing.
I'm not worried about speed/efficiency, as there isn't an insane amount of data (a few thousand entries). I'm using python 3. I am very open to any learning experience, I am trying to learn more about this kind of stuff. Any suggestions are extremely welcome and appreciated. Thanks!
Upvotes: 0
Views: 2713
Reputation: 479
Using the linked answer, here is some sample code that may be helpful: https://stackoverflow.com/a/33406474/13124888 (reference).
Before diving into the code, I would highly recommend looking at re
(which stands for regular expressions), which is a powerful library in native Python that you can use for finding keywords, swapping out text patterns, etc. You can documentation for this library here: https://docs.python.org/3/library/re.html.
Please also see the code snippet below, which is based off of the code in the linked post:
import re
matches_list = ['AMZN', 'ARBYS', ... ] # Keywords list
matches_to_category = {'AMZN': 'shopping', 'ARBYS': 'restuarant', ...} # keyword --> type dict
def match(input_string, string_list):
cat = [] # Initialize
words = re.findall(r'\w+', input_string)
keywords = set([word for word in words if word in string_list])
for keyword in keywords: # Iterate over words found for a line
cat.append(matches_to_category[keyword]) # Add category to keyword
return cat
>>> sentence = "AMZN is great for shopping; ARBYS has the meats!"
>>> match(sentence, matches_list)
['shopping', 'restuarant']
Upvotes: 0
Reputation: 4818
I have to write a separate if statement for each and every keyword. There has to be a better way to do this.
You can use a dictionary to store mapping of keywords to categories, and iterate the dict to find a match.
categories_dict = {"AMZN": "shopping", "ARBYS": "restaurant"}
def get_category(description):
for key in categories_dict:
if key in description:
return categories_dict.get(key)
return None
Upvotes: 0
Reputation: 179
While this is still slightly tedious, it's less tedious than your solution. I would use a dictionary to assign each keyword to a specific group. I would write it like this:
def getCategory(description):
my_dict = {'AMZN': 'shopping', 'ARBYS': 'restaurant'}
for i in my_dict:
if i in description:
return my_dict[i]
return None #Return none of none of the keywords are in the description
Upvotes: 1