Reputation: 4595
Hi and first of all thanks for your help.
I need to perform some data manipulation to a great number of strings in Java.
This is an example of the kind of string I have to modify:
<span foreground="blue" weight="bold">accomplish</span> vt, vi, 완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 <span foreground="blue" weight="bold">accomplish</span> a, prep, 완성한, 숙달한, 소양(교양)이 있는
What I need to do:
delete from the above string all the
<span..../span>
I will need to take out:
the a, the vt, the vi, the prep, and so on.
Not to take away all chars, but only the specific a, the vt, the vi, the prep, and some others.
EDIT EDIT EDIT So the expected output would be:
완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 완성한, 숙달한, 소양(교양)이 있는
I guess I would have to use Regex, but I am absolutely novice to the matter.
Please can somebody give me some help?
Thanks
Upvotes: 0
Views: 301
Reputation: 135752
Use String#replaceAll()
with the regex:
<span.*?/span>
.replaceAll()
takes a regex as first argument, whereas .replace()
takes a String
(a CharSequence
atually).
Java code:
String s = "<span foreground=\"blue\" weight=\"bold\">accomplish</span> vt, vi, 완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 <span foreground=\"blue\" weight=\"bold\">accomplish</span> a, prep, 완성한, 숙달한, 소양(교양)이 있는 ";
System.out.println(s.replaceAll("<span.*?/span>", ""));
Output:
vt, vi, 완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 a, prep, 완성한, 숙달한, 소양(교양)이 있는
If you need to take out more, you can put them in the regex, using the |
operator. For instance:
<span.*?/span>|a,|vt,|vi,|prep,|whateverYouWantDontForgetToEscape
Working code:
System.out.println(s.replaceAll("<span.*?/span>|a,|vt,|vi,|prep,", ""));
Output:
완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 완성한, 숙달한, 소양(교양)이 있는
Based on the expected output you just posted, you also want to remove duplicated spaces. For that, use this regex:
(<span.*?/span>|a,|vt,|vi,|prep,)(\s(?<=\s))*
Java code:
System.out.println(s.replaceAll("(<span.*?/span>|a,|vt,|vi,|prep,)(\\s(?<=\\s))*", ""));
Output:
완수하다, 성취하다, 달성하다, (학문과 기예를) 가르치다 완성한, 숙달한, 소양(교양)이 있는
Upvotes: 4