Reputation: 9688
I got the following text and wish to do some regex clean up of the new lines
Quality risk management. A systematic process for the assessment, control,
communication and review of risks to quality across the lifecycle. (ICH Q9)
Simulated agents. A material that closely approximates the physical and, where
practical, the chemical characteristics, e.g. viscosity, particle size, pH etc., of the product
under validation.
State of control. A condition in which the set of controls consistently provides assurance
of acceptable process performance and product quality.
Traditional approach. A product development approach where set points and operating
ranges for process parameters are defined to ensure reproducibility.
Worst Case. A condition or set of conditions encompassing upper and lower processing
limits and circumstances, within standard operating procedures, which pose the greatest
chance of product or process failure when compared to ideal conditions. Such conditions
do not necessarily induce product or process failure.
User requirements Specification (URS). The set of owner, user and engineering
requirements necessary and sufficient to create a feasible design meeting the intended
purpose of the system.
This almost works: re.sub(r'\w(?
but it also removes last and first character... How do I avoid this?
Here is the same example on regex101:
https://regex101.com/r/5uEsJR/1
Upvotes: 0
Views: 34
Reputation: 785316
Since your reges is matching \w
before and after \n
and it is not being put back in replacement, it gets lost.
You can use lookarounds as:
re.sub(r'(?<=\w)\n(?=\w)', ' ')
(?<=\w)
: Assert that we have a word character before\n
: Match a line break character(?=\w)
: Assert that we have a word character nextUpvotes: 1