Lixiang Wei
Lixiang Wei

Reputation: 69

How to remove adjectives or attributive before noun?

Currently I am using nltk to remove all the adjectives, this is my attempt:

def remove_adj(sentence):
  adjective_tags = ["JJ", "JJR", "JJS"]
  tokens = nltk.word_tokenize(sentence)
  tags = nltk.pos_tag(tokens)
  for i in range(len(tags)):
    word = [word for word,pos in tags if (pos not in adjective_tags)]
  return ' '.join(word)

But what I need is different from this one. Here are some examples:

input: "who has the highest revenue" output: "who has the revenue"

input: "who earned more than average income" output: "who earned more than income"

input: "what is the mean of profit" output: "what is the profit"

Can anyone give me some suggestions? Thanks all in advance.

Upvotes: 0

Views: 1461

Answers (1)

I think I understand what you are trying to achieve, but what problem are you having? I've run your code and it appears to work perfectly at removing adjectives.

A couple things are throwing me off though. For the below input/output, you can expect the word 'more' to be removed, as it is an adjective with token 'JJR'. Your post suggests that you were not expecting it to be removed.

input: "who earned more than average income" output: "who earned more than income"

Also, I'm not sure why you were expecting the word 'mean' to be removed in the below input/output, as it isn't an adjective.

input: "what is the mean of profit" output: "what is the profit"

A great place to check you sentences is Parts of Speech

Below would be your actual outputs, removing the adjectives correctly, and it seems to be doing just that.

input: "who has the highest revenue" output: "who has the revenue" input: "who earned more than average income" output: "who earned than income" input: "what is the mean of profit" output: "what is the mean of profit"

If you are simply trying to remove any descriptive elements pertaining to the noun, I would have to ask more about your problem. Your examples all ended with a noun, and this appears to be the noun you are focusing on. Will this be the case with all sentences that this code would handle? If so, you might consider iterating through your sentence backwards. You can easily identify the noun. As you step through, you would then look to see if the noun has a determiner (a, an, the) with tag 'DT', as you wouldn't want to remove that from what I see. You continue to step through removing everything until you reach an adjective or another noun. I don't know what your actual rules are for removing words on this one, but working backwards may help.

EDIT:

I tinkered with this a bit and got the below code to work exactly as you wanted on the outputs. You can populate tags in the 'stop_tags' variable if there are other speech tags you want it to stop on.

def remove_adj(sentence):
    
    stop_tags = ["JJ", "JJR", "JJS", "NN"]
    tokens = nltk.word_tokenize(sentence)
    tags = list(reversed(nltk.pos_tag(tokens)))
    noun_located = False
    stop_reached = False
    final_sent = ''

    for word,pos in tags:
        if noun_located == False and pos == 'NN':
            noun_located = True
            final_sent+=f' {word}'
        elif stop_reached == False and pos in stop_tags:
            stop_reached = True
        elif stop_reached == True:
            final_sent+=f' {word}'

    final_sent = ' '.join(reversed(final_sent.split(' ')))      
    return final_sent

x = remove_adj('what is the mean of profit')
print(x)

`

Upvotes: 2

Related Questions