Reputation: 250
I'm trying to filter headings from a big document.
Like this:
5.1.8 Reports
5 technische en applicatiearchitectuur
this version number 5.5.5 (or 5.5) should stay in the text but the 2 sentences above should be removed
The problem is that I don't want to remove any version numbers etc. I tried (\d.)
, but is there a way to write a regex that only removes headers and leaves the version numbers in the text?
Upvotes: 2
Views: 270
Reputation: 627327
You can use
(?m)^(\d+(?:\.\d+)*\.?)\h+.*
Replace with $1
backreference. See the regex demo.
In Java:
String result = s.replaceAll("(?m)^(\\d+(?:\\.\\d+)*\\.?)\\h+.*", "$1");
Details
(?m)^
- start of the line(\d+(?:\.\d+)*\.?)
- Group 1:
\d+
- 1 or more digits(?:\.\d+)*
- 0+ sequences of a .
followed with 1+ digits\.?
- an optional dot\h+
- 1 or more horizontal whitespace.*
- the rest of the lineString s = "5.1.8 Reports\n\n5 technische en applicatiearchitectuur\n\nthis version number 5.5.5 (or 5.5) should stay in the text but the 2 sentences above should be removed";
String result= s.replaceAll("(?m)^(\\d+(?:\\.\\d+)*\\.?)\\h+.*", "$1");
System.out.println(result);
Result
5.1.8
5
this version number 5.5.5 (or 5.5) should stay in the text but the 2 sentences above should be removed
Upvotes: 2