Not able to attach extracted POS taged Noun phrases to pandas data frame

I am trying to extract only noun and noun phrases to address data (a column a inside csv file).

I was able to remove the stop words, punctuations and numbers from the data. Also was able POS tag the data, but not able Extract Noun Phrases and attach back to data frame. Let me know what went wrong

    stopwords=nltk.corpus.stopwords.words('english')
    user_defined_stop_words=['hong','kong','hk','kowloon','hongkong']                    
    new_stop_words=stopwords+user_defined_stop_words

    data['Clean_addr'] = data['Adj_Addr'].apply(lambda x: ' '.join([item.lower() for item in x.split()]))
    data['Clean_addr']=data['Clean_addr'].apply(lambda x:"".join([item.lower() for item in x if  not  item.isdigit()]))
    data['Clean_addr']=data['Clean_addr'].apply(lambda x:"".join([item.lower() for item in x if item not in string.punctuation]))
    data['Clean_addr'] = data['Clean_addr'].apply(lambda x: ' '.join([item.lower() for item in x.split() if item not in (new_stop_words)]))

texts = data['Clean_addr'].tolist()
tagged_texts = pos_tag_sents(map(word_tokenize, texts))
data['POS']=tagged_texts
data['POS']=data['POS'].apply(lambda x:' '.join([item[0] for item in x if (item[0][1]=='NNP' or item[0][1]=='NNS')]))    

Sample Dump of the File I am using

https://www.dropbox.com/s/allhfdxni0kfyn6/Test.csv?dl=0

Upvotes: 0

Views: 340

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Based on the data linked :

data['POS'].apply(lambda x : ','.join([i[0] for i in x if (i[1]=='NNS' or i[1] =='NNP')]))

0               des
1               des
2           cfa,des
3     registrations
4                  
5            floors
6            queens
7            queens
8            queens
9                  
10       solicitors
11                 
12                 
13                 
14                 
15              des
Name: POS, dtype: object

Upvotes: 1

Related Questions