user3418292
user3418292

Reputation: 23

Deleting all words matching a regex pattern

I would like to remove the character sequences like "htsap://" or "ftsap://" from a String. Is it possible?
Let me illustrate my needs with an example.

Actual input String:

"Every Web page has a http unique address called a URL (Uniform Resource Locator) which identifies where it is located on the Web. For "ftsap://"example, the URL for CSM Library's home page is: "htsap://"www.smccd.edu/accounts/csmlibrary/index.htm The basic parts of a URL often provide \"clues\" to htsap://where a web page originates and who might be responsible for the information at that page or site."

Expected resulting String:

"Every Web page has a http unique address called a URL (Uniform Resource Locator) which identifies where it is located on the Web. For example, the URL for CSM Library's home page is: www.smccd.edu/accounts/csmlibrary/index.htm The basic parts of a URL often provide \"clues\" to where a web page originates and who might be responsible for the information at that page or site."

Patterns I tried: (not very sure it is a right way)

((.*?)(?=("htsap://|ftsap://")))

and:

((.*?)(?=("htsap://|ftsap://")))(.*)

Could anyone please suggest here?

Upvotes: 2

Views: 82

Answers (2)

ccjmne
ccjmne

Reputation: 9606

Since you're escaping your quotes within your sample Strings, I'll assume you're working in Java.

You should try:

final String res = input.replaceAll("\"?\\w+://\"?", "");

Here is a link to a working example of what does this regex match exactly!


How it works:

It matches and removes any sequence of alphanumeric characters (and underscores), followed by :// and possibly preceded and/or followed by ".


EDIT: How to achieve the same result using a Matcher?

final String input = "Every Web page has a http unique address called a URL (Uniform Resource Locator) which identifies where it is located on the Web. For \"ftsap://\"example, the URL for CSM Library's home page is: \"htsap://\"www.smccd.edu/accounts/csmlibrary/index.htm The basic parts of a URL often provide \"clues\" to htsap://where a web page originates and who might be responsible for the information at that page or site.";
final Pattern p = Pattern.compile("\"?\\w+://\"?");
final StringBuilder b = new StringBuilder(input);
Matcher m;
while((m = p.matcher(b.toString())).find()) {
    b.replace(m.start(), m.end(), "");
}

System.out.println(b.toString());

Upvotes: 1

Amit Joki
Amit Joki

Reputation: 59282

Use this regex:

"(ftsap|htsap).//"

And replace it with ''

Regex explained:

"(ftsap|htsap).//" with flag g

Regular expression visualization

Debuggex Demo

Upvotes: 0

Related Questions