Reputation: 97

replacing using regex python

I have a sentence like this

s = " zero/NN  divided/VBD  by/IN  anything/NN is zero/NN"

I need to replace all the words with tags to just tags . Output should be

s = "NN VBD IN NN is NN"

I tried using regex replace like this

tup = re.sub( r"\s*/$" , "", s)

but this is not giving me the correct output . Please help

Upvotes: 1

Answers (4)

stema

Reputation: 92976

This gives the output you want:

tup = re.sub( r"\b\w+/" , "", s)

\b is matching a word boundary, followed by \w+ at least one word character (a-zA-Z0-9_) and at least the slash.

Upvotes: 3

stew

Reputation: 11366

 tup = re.sub( r"\b\w+/(\w+)\b", r"\1", s)

on either side of my regex is \b meaning "word boundary", then on either side of "/" i have \w+ meaning "word characters". On the right we group them by putting them into parentheses.

The second expression r"\1" means. "the first group" which gets the stuff in parentheses.

Upvotes: 0

avasal

Reputation: 14854

try:

tup = re.sub( r"[a-z]*/" , "", s)

In [1]: s = " zero/NN divided/VBD by/IN anything/NN is zero/NN"
In [2]: tup = re.sub( r"[a-z]*/" , "", s)
In [3]: print tup
 NN VBD IN NN is NN

Upvotes: 2

Lukáš Lalinský

Reputation: 41306

The \s character group matches all whitespace characters, which doesn't seem what you want. I think you want the other case, all non-whitespace characters. You can also be more specific on what is a tag, for example:

tup = re.sub( r"\S+/([A-Z]+)" , r"\1", s)

This replaces all non-whitespace characters, followed by a slash and then a sequence of uppercase letters with just the uppercase letters.

Upvotes: 0

replacing using regex python

Answers (4)

Related Questions