Marc Monforte
Marc Monforte

Reputation: 31

Want to remove '\n' from items that do not match regex

If have block of text that has a non-uniform list that, for example, looks like the following:

1234:5678 words.words
1234:567 words
1234:5678 wordswords
targetMe
1234:678 words
targetMe

And I have a regex that looks something like the following, I can act upon the items that do match it (i.e., everything but the lines that starts without numbers):

fooRegex = re.compile(r'(\d{4}:\d+\s.*')

How can I target the lines that don't match the regex to remove the \n? In the end, I want something that would look like the following:

1234:5678 words.words
1234:567 words
1234:5678 wordsword,targetMe
1234:678 words,targetMe

Or is there a better way to go about this than regular expressions?

Upvotes: 0

Views: 144

Answers (2)

xiience
xiience

Reputation: 467

Regex seems fine here, however your regex is invalid, you had an extra ( at the beginning.

I believe this does what you're looking for:

import re

input = """1234:5678 words.words
1234:567 words
1234:5678 wordswords
targetMe
1234:678 words
targetMe"""

fooRegex = re.compile(r'\d{4}:\d+\s.*')

output = ''.join([ '\n' + line if fooRegex.search(line) is not None else ',' + line for line in input.split('\n') ])[1:]

print(output)

It splits the lines into a list, and then creates a new list out of elements that are added to the list depending on the results of the regex, with '\n' or ','. It then joins the elements of the list into a string, then we chop off the first '\n' with [1:]

Upvotes: 0

zwer
zwer

Reputation: 25799

You don't even need regex for this, but if you want to do it with regex - use negative lookaheads to select the new lines to remove and replace them with a comma:

import re

data = """1234:5678 words.words
1234:567 words
1234:5678 wordswords
targetMe
1234:678 words
targetMe"""

DATA_FIXER = re.compile(r"\n(?!\d{4}:\d+)")  # you want it compiled for reuse?

data_fix = DATA_FIXER.sub(",", data)
# 1234:5678 words.words
# 1234:567 words
# 1234:5678 wordswords,targetMe
# 1234:678 words,targetMe

Upvotes: 1

Related Questions