Kartik Singh
Kartik Singh

Reputation: 1

Automatic tagging of words or phrases

I want to automatically tag a word/phrase with one of the defined words/phrases from a list. My list contains about 230 words in columnA which are tagged in columnB. There are around 16 unique tags and every of those 230 words are tagged with one of these 16 tags.

Have a look at my list:

The words/phrases in column A are tagged as words/phrases in column B. enter image description here

From time to time, new words are added for which tag has to be given manually. I want to build a predictive algorithm/model to tag new words automatically(or suggest). So if I write a new word, let say 'MIP Reserve' (A36), then it should predict the tag as 'Escrow Deposits'(B36) and not 'Operating Reserve'(B33). How should I predict the tags of new word precisely even if the words do not match with the words in its actual tag? If someone is willing to see the full list, I can happily share.

Upvotes: 0

Views: 1457

Answers (1)

Jonathan
Jonathan

Reputation: 1025

Short version

I think your question is a little ill-defined and doesn't have a short coding or macro answer. Given that each item contains such little information, I don't think it is possible to build a good predictive model from your source data. Instead, do the tagging exercise once and look at how you control tagging in the future.

Long version

Here are the steps I would take to create a predictive model and why I don't think you can do this.

  1. Understand why you want to have a predictive program at all

Why do you need a predictive program? Are you sorting through hundreds or thousands of records, all of which are changing and need tagging? If so, I agree, you wouldn't want to do this manually.

If this is a one-off exercise, because over time the tags have become corrupted from their original meaning, your problem is that your tags have become corrupted, not that you need to somehow predict where each item should be tagged. You should be looking at controlling use of the tags, not at predicting how people in the future might mistag or misname something.

Don't forget that there are lots of tools in Excel to make the problem easier. Let's say you know for certain that all items with 'cash' definitely go to 'Operating Cash'. Put an AutoFilter on the list and filter on the word 'cash' - now just copy and paste 'Operating Cash' next to all of these. This way, you can quickly get rid of the obvious ones from your list and focus on the tricky ones.

  1. Understand the characteristics of the tags you want to use.

Take time to look at the tags you are using - what do each of them mean? What are the unique features or combinations of features that this tag is representing?

For example, your tag 'Operating Cash' carries the characteristics of being cash (i.e. not tied up so available for use fairly quickly) and as being earmarked for operations. From these, we could possibly derive further characteristics that it is held in a certain place, or a certain person has responsibility for it.

If you had more source data to go on, you could perhaps use fields such as 'year created', or 'customer' to help you categorise further.

  1. Understand what it is about the items you want to tag that could give you an idea of where they should go.

This is your biggest problem. A quick example - what in the string "MIP Reserve" gives any clues that it should be linked to "Escrow Deposits"? You have no easy way of matching many of the items in your list - many words appear in multiple items across multiple tags.

However, try and look for unique identifiers that will give you clues - for example, all items with the word 'developer' seem to be tagged to 'Developer Fee Note & Interest'. Do you have any more of those? Use these to reduce your problem, since they should be a straightforward mapping.

Any unique identifiers will allow you to set up rules for these strings. You don't even need to stick to one word - perhaps when you see several words, you can narrow down where it will end up e.g. when I see 'egg' this could go into 'bird' or 'reptile', but if 'egg' is paired with 'wing', I can be fairly confident it's 'bird'.

You need to match the characteristics of the items you want to tag with the unique identifiers of the tags you developed in step 1.

  1. Write a program or macro to look for the identifiers in step 2 and return the relevant tag from step 1.

This is the straightforward bit. Look for the identifiers you want (e.g. uses 'cash', contains tag 'Really Important Customer') and look for the best match in the tags you have earlier.

Ensure you catch any errors - what happens if no tag is found? Does it create a new one? Does it recommend contacting you for help? What happens if more than one tag is relevant? What are your tiebreaking criteria?

But be aware of...

  1. Understand how you will control use of these unique identifiers.

Imagine you somehow manage to come up with a list of unique identifiers. How will you control their use? If you have decided to send any item with the word 'cash' to the tag 'Operating Cash' and then in a year, someone comes along and makes an item 'Capital Cash', because they want somewhere to put cash that is about to be spent on capital items, how do you stop this? How are you going to control use of these words?

You will effectively need to take control of the item naming system and set up an agreed list of identifying words. Whenever anyone makes an item, they need to include your identifiers somewhere. I can tell you that this will not work. Either they will use the wrong words and you will end up manually doing it anyway, or they will ring you up confused and you will end up manually doing it anyway.

If you are the only person doing this, just do the exercise once, to your own standard (that you record) and stick to that standard. When you need to hand it over, it's clearly ordered and makes sense. If more than one person is doing this, do the exercise once between you and the team and then agree a way of controlling it.

Writing a predictive program sounds great and might save you some time. But consider why you are writing it. Are you likely to need to tag accounts constantly in the future? If so, control their naming centrally and make it so a tag is mandatory when they are made. If not, why are you writing a program to do this? Just do it once, manually.

Upvotes: 2

Related Questions