Reputation: 462
Sorry if this seems like a silly question, but I am still new to Python and SpaCy.
I have a data frame that contains customer complaints. It looks a bit like this:
df = pd.DataFrame( [[1, 'I was waiting at the bus stop and then suddenly the car mounted the pavement'],
[2, 'When we got on the bus, we went upstairs but the bus braked hard and I fell'],
[3, 'The bus was clearly in the wrong lane when it crashed into my car']],
columns = ['ID', 'Text'])
If I want to obtain the noun phrases, then I can do this:
def extract_noun_phrases(text):
return [(chunk.text, chunk.label_) for chunk in nlp(text).noun_chunks]
def add_noun_phrases(df):
df['noun_phrases'] = df['Text'].apply(extract_noun_phrases)
add_noun_phrases(df)
What about if I want to extract prepositional phrases from the df
? So, specifically trying to extract lines like:
at the bus stop
in the wrong lane
I know I am meant to be using subtree
for this, but I don't understand how to apply it to my dataset.
Upvotes: 0
Views: 761
Reputation: 3530
A prepositional phrase is simply a preposition followed by a noun phrase.
Since you already know how to identify noun phrases using noun_chunks
, it may be as simple as checking the token before the noun phrase. If this preceding_token.pos_
is 'ADP' (APD means adposition and a preposition is a type of adposition.)), then you have probably found a prepositional phrase.
Instead of checking pos_
, you could check whether preceding_token.dep_
is 'prep' instead. It depends on which elements of the SpaCy pipeline you have enabled, but the results should be similar.
Upvotes: 3