Removing variable length characters from a string in python

Question

I have strings that are of the form below:

The is a string.
This is another string.

They are read in from a text file one line at a time. I want to separate these into words. For that I am just splitting the string using split().

Now I have a set of words but the first word will be

The rather than The. Same for the other words that have <> next to them. I want to remove the <..> from the words.

I'd like to do this in one line. What I mean is I want to pass as a parameter something of the form <*> like I would on the command line. I was thinking of using the replace() function to try to do this, but I am not sure how the replace() function parameter would look like.

For example, how could I change <..> below in a way that it will mean that I want to include anything that is between < and >:

x = x.replace("<..>", "")

user2555451 · Accepted Answer

Unfortunately, str.replace does not support Regex patterns. You need to use re.sub for this:

>>> from re import sub
>>> sub("<[^>]*>", "", "The is a string.")
'The is a string.'
>>> sub("<[^>]*>", "", "This is another string.")
'This is another string.'
>>>

[^>]* matches zero or more characters that are not >.

Removing variable length characters from a string in python

Answers (2)

No Need for a 2-Step Solution

Option 1: Match All Instead of Splitting

Option 2: One Single Split

Related Questions