Reputation: 97
I have a sentence like this
s = " zero/NN divided/VBD by/IN anything/NN is zero/NN"
I need to replace all the words with tags to just tags . Output should be
s = "NN VBD IN NN is NN"
I tried using regex replace like this
tup = re.sub( r"\s*/$" , "", s)
but this is not giving me the correct output . Please help
Upvotes: 1
Views: 251
Reputation: 92976
This gives the output you want:
tup = re.sub( r"\b\w+/" , "", s)
\b
is matching a word boundary, followed by \w+
at least one word character (a-zA-Z0-9_
) and at least the slash.
Upvotes: 3
Reputation: 11366
tup = re.sub( r"\b\w+/(\w+)\b", r"\1", s)
on either side of my regex is \b meaning "word boundary", then on either side of "/" i have \w+ meaning "word characters". On the right we group them by putting them into parentheses.
The second expression r"\1" means. "the first group" which gets the stuff in parentheses.
Upvotes: 0
Reputation: 14854
try:
tup = re.sub( r"[a-z]*/" , "", s)
In [1]: s = " zero/NN divided/VBD by/IN anything/NN is zero/NN"
In [2]: tup = re.sub( r"[a-z]*/" , "", s)
In [3]: print tup
NN VBD IN NN is NN
Upvotes: 2
Reputation: 41306
The \s
character group matches all whitespace characters, which doesn't seem what you want. I think you want the other case, all non-whitespace characters. You can also be more specific on what is a tag, for example:
tup = re.sub( r"\S+/([A-Z]+)" , r"\1", s)
This replaces all non-whitespace characters, followed by a slash and then a sequence of uppercase letters with just the uppercase letters.
Upvotes: 0