Norfeldt
Norfeldt

Reputation: 9688

Python RegEx clean-up randomly placed new lines

I got the following text and wish to do some regex clean up of the new lines

Quality risk management. A systematic process for the assessment, control,
communication and review of risks to quality across the lifecycle. (ICH Q9)

Simulated agents. A material that closely approximates the physical and, where
practical, the chemical characteristics, e.g. viscosity, particle size, pH etc., of the product
under validation.

State of control. A condition in which the set of controls consistently provides assurance
of acceptable process performance and product quality.

Traditional approach. A product development approach where set points and operating
ranges for process parameters are defined to ensure reproducibility.

Worst Case. A condition or set of conditions encompassing upper and lower processing
limits and circumstances, within standard operating procedures, which pose the greatest
chance of product or process failure when compared to ideal conditions. Such conditions
do not necessarily induce product or process failure.


User requirements Specification (URS). The set of owner, user and engineering
requirements necessary and sufficient to create a feasible design meeting the intended
purpose of the system.

This almost works: re.sub(r'\w(?

but it also removes last and first character... How do I avoid this?

Here is the same example on regex101:

https://regex101.com/r/5uEsJR/1

Upvotes: 0

Views: 34

Answers (1)

anubhava
anubhava

Reputation: 785316

Since your reges is matching \w before and after \n and it is not being put back in replacement, it gets lost.

You can use lookarounds as:

re.sub(r'(?<=\w)\n(?=\w)', ' ')

RegEx Demo

  • (?<=\w): Assert that we have a word character before
  • \n: Match a line break character
  • (?=\w): Assert that we have a word character next

Upvotes: 1

Related Questions