Reputation: 111
I want to replace words (e.g., verbs, adverbs...) into some special string (e.g., "NIL") except adjectives and nouns.
That is to say, for a text:
anarchism originated as a term of abuse first used against early working class radicals
I first do POS tagging (universal format), resulting in a tagged format:
anarchism/NOUN originated/VERB as/ADP a/DET term/NOUN of/ADP abuse/NOUN first/ADV used/VERB against/ADP early/ADJ working/NOUN class/NOUN radicals/NOUN
and I want to obtain the text like this:
anarchism/NOUN NIL NIL NIL term/NOUN NIL abuse/NOUN NIL NIL NIL NIL working/NOUN class/NOUN radicals/NOUN
which preserve the nouns and adjectives while replace the other words with special string (like "NIL").
Is there some efficient way to do this in Python, my corpus size could be 10G+.
Thanks a lot!
Upvotes: 0
Views: 966
Reputation: 1207
You can also use This regex \w*/(?!NOUN)[A-Z]*
>>> import re
>>> s = "anarchism/NOUN originated/VERB as/ADP a/DET term/NOUN of/ADP abuse/NOUN first/ADV used/VERB against/ADP early/ADJ working/NOUN class/NOUN radicals/NOUN"
>>> re.sub("\w*/(?!NOUN)[A-Z]*","NIL",s)
'anarchism/NOUN NIL NIL NIL term/NOUN NIL abuse/NOUN NIL NIL NIL NIL working/NOUN class/NOUN radicals/NOUN'
You can test it here.
Upvotes: 2
Reputation: 138
Try splitting the string into each word, and check what type of word it is:
string = 'anarchism/NOUN originated/VERB as/ADP a/DET term/NOUN of/ADP abuse/NOUN first/ADV used/VERB against/ADP early/ADJ working/NOUN class/NOUN radicals/NOUN'
string = string.split(' ')
temp = ''
for a in string:
if '/NOUN' in a:
temp += a + ' '
else:
temp += 'NIL '
string = temp
print(string)
Upvotes: 1