Reputation: 1
I'm working with a DataFrame in Python that has a column named 'POS_TAGS'
. Each entry in this column is a list of tuples, where each tuple contains a word and its part-of-speech (POS) tag. Here is an example of the data structure in the 'POS_TAGS'
column:
[
[('word1', 'NN'), ('word2', 'VB'), ('word3', 'NN')],
[('word4', 'JJ'), ('word5', 'NN')],
...
]
I would like to extract all words that have a specific POS tag (e.g., 'NN'
for nouns) from this column and store them in a list. How can I do this efficiently?
I've attempted using list comprehensions, but I'm unsure if I'm approaching this correctly or efficiently.
Code Attempt
# Example code attempt
target_tag = 'NN'
all_words_with_target_tag = [
word for row in df['POS_TAGS'] for word, tag in row if tag == target_tag
]
Is this the right approach? Are there better methods for handling this kind of task, especially if the DataFrame is large? Any guidance on optimizing this or explaining list comprehension usage here would be appreciated!
Upvotes: 0
Views: 21
Reputation: 317
This code create a new column with word having 'NN' post tag:
import pandas as pd
post = [[('word1', 'NN'), ('word2', 'VB'), ('word3', 'NN')],[('word4', 'JJ'), ('word5', 'NN')]]
df = pd.DataFrame({'TEXT':['text'],'POST':[post]})
df['WORDS_NN'] = df['POST'].map(lambda post_data : [p[0] for line in post_data for p in line if p[1]=='NN'])
df
Out of this, try to read about python spacy. its very usefull for NLP like Post tag filters
Upvotes: 0