DaShield
DaShield

Reputation: 51

How to use join and regex?

I'm trying to add \n after the quotation mark (") and space.

The closest that I could find is re.sub however it remove certain characters.

line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
q = re.sub(r'[\d\w]" ', '\n', line)
print(q)

Output:

Type: "SecurityInciden\nRowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F2\n

Looking for a solution without any character being remove.

Upvotes: 2

Views: 504

Answers (3)

The fourth bird
The fourth bird

Reputation: 163467

In your regex you are removing the t from incident because you are matching it and not using it in the replacement.

Another option to get your result might be to split on a double quote followed by a whitespace when preceded with a word character using a positive lookbehind.

Then join the result back together using a newline.

(?<=\w)" 

Regex demo | Python demo

For example:

import re
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
print("\n".join(re.split(r'(?<=\w)" ', line)))

Result

Type: "SecurityIncident
RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

Your attempted regex [\d\w]" is almost fine but has some little short comings. You don't need to write \d with \w in a character set as that is redundant as \w already contains \d within it. Since \w alone is enough to represent an alphabet or digit or underscore, hence no need to enclose it in character set [] hence you can just write \w and your updated regex becomes \w".

But now if you match this regex and substitute it with \n it will match a literal alphabet t then " and a space and it will be replaced by \n which is why you are getting this output,

SecurityInciden\nRowID

You need to capture the matched string in group1 and while substituting, you need to use it while substituting so that doesn't get replaced hence you should use \1\n as replacement instead of just \n

Try this updated regex,

(\w" )

And replace it by \1\n

Demo1

If you notice, there is an extra space at the end of line in the first line and if you don't want that space there, you can take that space out of those capturing parenthesis and use this regex,

(\w") 
     ^ space here

Demo2

Here is a sample python code,

import re

line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
q = re.sub(r'(\w") ', r'\1\n', line)
print(q)

Output,

Type: "SecurityIncident"
RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"

Upvotes: 1

robertu
robertu

Reputation: 116

Try this:

import re
line = 'Type: "SecurityIncident" RowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"'
pattern = re.compile('(\w+): (".+?"\s?)', re.IGNORECASE)
q = re.sub(pattern, r'\g<1>: \g<2>\n', line)
print(repr(q))

It should give you following resutls:

Type: "SecurityIncident" \nRowID: "FB013B06-B04C-4FEB-A5A5-3B858F910F29"\n

Upvotes: 0

Related Questions