val
val

Reputation: 33

Python NLP: identifying the tense of a sentence using TextBlob, StanfordNLP or Google Cloud

(Note: I am aware that there have been previous posts on this question (e.g. here or here, but they are rather old and I think there has been quite some progress in NLP in the past few years.)

I am trying to determine the tense of a sentence, using natural language processing in Python.

Is there an easy-to-use package for this? If not, how would I need to implement solutions in TextBlob, StanfordNLP or Google Cloud Natural Language API?

TextBlob seems easiest to use, and I manage to get the POS tags listed, but I am not sure how I can turn the output into a 'tense prediction value' or simply a best guess on the tense. Moreover, my text is in Spanish, so I would prefer to use GoogleCloud or StanfordNLP (or any other easy to use solution) which support Spanish.

I have not managed to work with the Python interface for StanfordNLP.

Google Cloud Natural Language API seems to offer exactly what I need (see here, but I have not managed to find out how I would get to this output. I have used Google Cloud NLP for other analysis (e.g. entity sentiment analysis) and it has worked, so I am confident I could set it up if I find the right example of use.

Example of textblob:

from textblob import TextBlob
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob = TextBlob("I am curious to see whether NLP is able to predict the tense of this sentence., pos_tagger=nltk_tagger)
print(blob.pos_tags)

-> this prints the pos tags, how would I convert them into a prediction of the tense of this sentence?

Example with Google Cloud NLP (after setting up credentials):

from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types
text = "I am curious to see how this works"
client = language.LanguageServiceClient()
document = types.Document(
    content=text,
    type=enums.Document.Type.PLAIN_TEXT)

tense = (WHAT NEEDS TO COME HERE?)
print(tense)

-> I am not sure about the code that needs to be entered to predict the tense (indicated in the code)

I am quite a newbie to Python so any help on this topic would be highly appreciated! Thanks!

Upvotes: 1

Views: 4133

Answers (2)

Jaggz
Jaggz

Reputation: 21

I worked with chatgpt to code this up (correcting it and it advancing a bunch of it in ways that'd take me forever to figure out. So far in the included tests it works pretty good, but it has some problems and could use some help.

The code allows detecting the main tense of a sentence (past, present, future, unknown), as well as that of an embedded/subordinate clause. I wanted it for assisting in time adjustment for a separate speech-to-text project -- for a sentence like "Jamie wants to have food in 3 hours.", where it's present tense, but the referenced time is in the future.

Most of the predicted tests actually work for my time-adjustment project, so I'm leaving those, but some others fail and I don't know how to handle it. For example, "She wants to go to sleep." and "She wants to go sleep in 3 hours." both I'd want the embedded clause to be present and future (respectively). (The current code gets it as "unknown"). Screencap of part of the tests' output

I'm thinking, if the main clause is present, and the embedded is unknown, I can place it in the future, but I'd like it to handle the grammar, not just the final "unknown" (unless that's all that's needed).

Here's the current code. (Note that the bansi module is for term color codes and is here: https://gist.github.com/jaggzh/35b3705327ad9b4a3439014b8153384e)

#!/usr/bin/env python3
import spacy
from tabulate import tabulate
from bansi import *
import sys

nlp = spacy.load("en_core_web_sm")

def pe(*x, **y):
    print(*x, **y, file=sys.stderr)

def detect_tense(sentence):
    sent = list(nlp(sentence).sents)[0]
    root_tag = sent.root.tag_
    aux_tags = [w.tag_ for w in sent.root.children if w.dep_ == "aux"]
    # Detect past tense
    if root_tag == "VBD" or "VBD" in aux_tags:
        return "past"
    # Detect present tense
    if root_tag in ["VBG", "VBP", "VBZ"] or ("VBP" in aux_tags or "VBZ" in aux_tags):
        return "present"
    # Detect future tense (usually indicated by the auxiliary 'will' or 'shall')
    if any(w.lower_ in ["will", "shall"] for w in sent.root.children if w.dep_ == "aux"):
        return "future"
    return "unknown"

def extract_subtree_str(token):
    return ' '.join([t.text for t in token.subtree])

def detect_embedded_tense(sentence):
    doc = nlp(sentence)
    main_tense = "unknown"
    embedded_tense = "unknown"
    for sent in doc.sents:
        root = sent.root
        main_tense = detect_tense(sentence) # Detect main clause tense
        for child in root.children:     # Detect embedded clause tense
            if child.dep_ in ["xcomp", "ccomp", "advcl"]:
                clause = extract_subtree_str(child)
                embedded_tense = detect_tense(clause)
    return main_tense, embedded_tense

def show_parts(sentence):
    doc = nlp(sentence)
    words = [''] + [str(token) for token in doc]
    tags = ['pos'] + [token.tag_ for token in doc]
    deps = ['dep'] + [token.dep_ for token in doc]
    print(tabulate([words, tags, deps]))
# def get_verb_tense(sentence):
#     doc = nlp(sentence)
#     for token in doc:
#         print(f"  tag_: {token.tag_}")
#         if "VERB" in token.tag_:
#             return token.tag_
#     return "No verb found"

if __name__ == '__main__':
    # Test the function
    sentences = [
        # (sentence, main_clause_expected_tense, embedded_clause_expected_tense)
        ("I ate an apple.", "past", "unknown"),
        ("I had eaten an apple.", "past", "unknown"),
        ("I am eating an apple.", "present", "unknown"),
        ("She needs to sleep at 4.", "present", "future"),
        ("She needed to sleep at 4.", "past", "past"),
        ("I ate an apple.", "past", "unknown"),
        ("I had eaten an apple.", "past", "unknown"),
        ("I am eating an apple.", "present", "unknown"),
        ("I eat an apple.", "present", "unknown"),
        ("I have been eating.", "present", "unknown"),
        ("I will eat an apple.", "future", "unknown"),
        ("I shall eat an apple.", "future", "unknown"),
        ("She will eat at 3.", "future", "unknown"),
        ("She ate at 3.", "past", "unknown"),
        ("She went to sleep at 4.", "past", "unknown"),
        ("She has to eat.", "future", "unknown"),
        ("She wants to go sleep.", "present", "future"),  # This could be debated
        ("She wants to go sleep in 3 hours.", "present", "future"),  # This could be debated
        ("She wanted to go sleep earlier.", "past", "past"),
        ("I want to be sleeping.", "present", "future"),  # This could be debated
        ("I am sleeping.", "present", "unknown"),
        ("She is eating.", "present", "unknown"),
    ]
    for s, exp_main_tense, exp_embedded_tense in sentences:
        print(f"{bgblu}{yel}-------------------------------------- {rst}")
        print(f"{bgblu}{yel} Sent: {s}{rst}")
        show_parts(s)
        det_main_tense, det_embedded_tense= detect_embedded_tense(s)
        print(f"   Main Pred-Tense: {yel}{det_main_tense}{rst}")
        print(f"   Main  Exp-Tense: {yel}{exp_main_tense}{rst}")
        if det_main_tense== exp_main_tense:
            print(f"                    {bgre}MATCH{rst}")
        else:
            print(f"                    {bred}MISMATCH{rst}")
        print(f"   Embedded Pred-Tense: {yel}{det_embedded_tense}{rst}")
        print(f"   Embedded  Exp-Tense: {yel}{exp_embedded_tense}{rst}")
        if det_embedded_tense== exp_embedded_tense:
            print(f"                        {bgre}MATCH{rst}")
        else:
            print(f"                        {bred}MISMATCH{rst}")

Upvotes: 1

Jindřich
Jindřich

Reputation: 11220

I don't think any NLP toolkit has a function to detect past tense right away. But you can simply get it from dependency parsing and POS tagging.

Do the dependency parse of the sentence and have a look at the root which is the main predicate of the sentence and its POS tag. If it is VBD (a verb is the past simple form), it is surely past tense. If it is VB (base form) or VBG (a gerund), you need to check its dependency children and have check if there is an auxiliary verb (deprel is aux) having the VBD tag.

If you need to cover also present/past perfect or past model expressions (I must have had...), you can just extend the conditions.

In spacy (my favorite NLP toolkit for Python), you can write it like this (assuming your input is a single sentence):

import spacy
nlp = spacy.load('en_core_web_sm')

def detect_past_sentece(sentence):
    sent = list(nlp(sentence).sents)[0]
    return (
        sent.root.tag_ == "VBD" or
        any(w.dep_ == "aux" and w.tag_ == "VBD" for w in sent.root.children))

With Google Cloud API or StanfordNLP, it would be basically the same, I am just no so familiar with the API.

Upvotes: 9

Related Questions