Writing a regular expression for getting all words after a specific charcter

Question

I have a file in which all the lines are in the format title - news_source. I want to substitute all the characters after the title for (whitespace).

So far I only have the pattern as \s-\s but don't know what pattern to write for the news_source.

Can somebody guide me through the process of writing the regex for the news_source. Thanks!

The fourth bird · Accepted Answer

You can match \s-\s.* and replace with an empty string.

The \s can also match a newline. If you want to match whitespace characters without newlines, you can also use [^\S ]-[^\S ].*

import re    

s = ("title - news_source
"
            "Airbnb stock has 15% upside after an 'impressive' earning report, says BofA - Business Insider")
result = print(re.sub(r"\s-\s.*", " ", s))

Output

title
Airbnb stock has 15% upside after an 'impressive' earning report, says BofA

If there should be at least a single non whitespace char \S at the start of the string, you can use a capture group and use the group followed by a space in the replacement.

re.sub(r"^(\S.*)[^\S
]-[^\S
].*", r"\1 ", s)

Regex demo | Python demo

Writing a regular expression for getting all words after a specific charcter

Answers (2)

Edit:

Related Questions