James Gessel
James Gessel

Reputation: 39

How to categorize a list of data by keyword in python?

I have a list of transactions, including things like

*"AMZN mktp US*MH434G300", 
*"HEALTH CARE WEB PMT",* 
*"ARBYS #4323"** 

etc, and I want to write a program that will look for keywords in these descriptions, and assign a category based on these keywords. I haven't found anything like this in my internet searches surprisingly, and I suppose its possible its because its difficult to do.

What I have done so far is something like this:

def getCategory(description):
    cat = ''
    if 'AMZN' in description:
       cat = 'shopping'
    elif 'ARBYS' in description:
        cat = 'restaurant'
return cat

While this does work, its extremely painstaking, and I have to write a separate if statement for each and every keyword. There has to be a better way to do this. Is there a library for something like this? Even just a way I could add a bunch of keywords to a list, and then use the list in the if statement would be amazing.

I'm not worried about speed/efficiency, as there isn't an insane amount of data (a few thousand entries). I'm using python 3. I am very open to any learning experience, I am trying to learn more about this kind of stuff. Any suggestions are extremely welcome and appreciated. Thanks!

Upvotes: 0

Views: 2713

Answers (3)

Ryan S
Ryan S

Reputation: 479

Using the linked answer, here is some sample code that may be helpful: https://stackoverflow.com/a/33406474/13124888 (reference).

Before diving into the code, I would highly recommend looking at re (which stands for regular expressions), which is a powerful library in native Python that you can use for finding keywords, swapping out text patterns, etc. You can documentation for this library here: https://docs.python.org/3/library/re.html.

Please also see the code snippet below, which is based off of the code in the linked post:

import re

matches_list = ['AMZN', 'ARBYS', ... ]  # Keywords list
matches_to_category = {'AMZN': 'shopping', 'ARBYS': 'restuarant', ...}  # keyword --> type dict 

def match(input_string, string_list):
    cat = []  # Initialize
    words = re.findall(r'\w+', input_string)
    keywords = set([word for word in words if word in string_list])
    for keyword in keywords:  # Iterate over words found for a line
        cat.append(matches_to_category[keyword])  # Add category to keyword
    return cat

>>> sentence = "AMZN is great for shopping; ARBYS has the meats!"
>>> match(sentence, matches_list)
['shopping', 'restuarant']

Upvotes: 0

narendra-choudhary
narendra-choudhary

Reputation: 4818

I have to write a separate if statement for each and every keyword. There has to be a better way to do this.

You can use a dictionary to store mapping of keywords to categories, and iterate the dict to find a match.

categories_dict = {"AMZN": "shopping", "ARBYS": "restaurant"}

def get_category(description):
  for key in categories_dict:
    if key in description:
      return categories_dict.get(key)
  return None

Upvotes: 0

ZeOnlyOne
ZeOnlyOne

Reputation: 179

While this is still slightly tedious, it's less tedious than your solution. I would use a dictionary to assign each keyword to a specific group. I would write it like this:

def getCategory(description):
    my_dict = {'AMZN': 'shopping', 'ARBYS': 'restaurant'}
    for i in my_dict:
        if i in description:
            return my_dict[i]
    return None #Return none of none of the keywords are in the description

Upvotes: 1

Related Questions