Reputation: 35
I'm making an word frequency program and I'm trying to process text to make it manageable. I'm trying to remove all special characters except $%^*+-=,./<> which are a part of a number. I have virtually no experience with regular expressions and after reading a bunch on it, I tried using the negative lookadead and negative lookaround to get something like
String replace = "[^a-z0-9\\\\s] | (?<!\\d)[$%^*+\\-=,./<>_] | [$%^*+\\-=,./<>_](?!\\d)";
text.replaceAll(replace, "");
In short I want "they're." to become "theyre" but I want "1223.444" to remain unchanged.
Upvotes: 2
Views: 632
Reputation: 626794
You can use
text = text.replaceAll(replace, "[\\p{P}\\p{S}&&[^$%^*+=,./<>_-]]|[$%^*+=,./<>_-](?!(?<=\\d.)\\d)", "");
Details:
[\p{P}\p{S}&&[^$%^*+=,./<>_-]]
- a character class intersection construct that matches any punctuation (\p{P}
) or symbol (\p{S}
) except $
, %
, ^
, *
, +
, =
, ,
, .
, /
, <
, >
, _
and -
|
- or[$%^*+=,./<>_-](?!(?<=\d.)\d)
- a $
, %
, ^
, *
, +
, =
, ,
, .
, /
, <
, >
, _
or -
char that is not immediately followed with a digit which is in its turn not immediately preceded with a digit and any char (.
is used to match the symbol/punctuation consumed with [$%^*+=,./<>_-]
).Upvotes: 1