Reputation: 2134
I am regex impaired, so I apologize for that, and want to thank, in advance, whoever can help me with this.
I have text as follows:
real text that i want to keep i e 2 2 1 i h i i i E h i L h R 9 more real text
i e 1 i tr L h R 1 i L ? i j 1 more real text that i want to keep d i j 0 etc...
You can see the sections of "junk" text that occur - these are what I want to remove. I'm not necessarily looking for 100% accuracy, but I'd like a regex that can get rid of most of these sections. I consider junk text to be any section with four or more consecutive occurrences of one or two characters, followed by a space.
As noted in the tags, I am working with c#. Thanks again.
Upvotes: 2
Views: 239
Reputation: 2516
Something like this?
\b(.{1,2}\s){4,}
You can obviously substitute the fullstop/period for a more exact match if you know what characters to allow.
Upvotes: 3