Reputation: 43
i've been looking stackoverflow but couldn't get anyone with this kind of problem.
I want to do something like this:
Input String:
<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs><p>
Vende-se a peças ou o conjunto.</p><br>
</Obs>
</Object>
</List>
What i want is to strip html tags, like <p>,<br>
etc. So it ends like this:
<?xml version="1.0" encoding="UTF-8" ?>
<List>
<Object>
<Section>Fruit</Section>
<Category>Bananas</Category>
<Brand>Chiquita</Brand>
<Obs>
Vende-se a peças ou o conjunto.
</Obs>
</Object>
</List>
I have been playing around with JSoup, but i can't seem to make it work properly.
This is the code i have:
Whitelist whitelist = Whitelist.none();
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?><List><Object><Section>Fruit</Section><Category>Bananas</Category><Brand>Chiquita</Brand><Obs><p>Vende-se a peças ou o conjunto.</p><br></Obs></Object></List>";
whitelist.addTags(new String[]{"?xml", "List", "Object", "Section", "Category", "Brand", "Obs"});
String safe = Jsoup.clean(xml, whitelist);
This is the result i am obtaining:
FruitBananasChiquitaVende-se a peças ou o conjunto.
Thanks in advance
Upvotes: 2
Views: 464
Reputation: 25380
You can use unwrap()
to do so:
Example:
final String input = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"
+ "<List>\n"
+ " <Object>\n"
+ " <Section>Fruit</Section>\n"
+ " <Category>Bananas</Category>\n"
+ " <Brand>Chiquita</Brand>\n"
+ " <Obs><p>\n"
+ "Vende-se a peças ou o conjunto.</p><br>\n"
+ " </Obs>\n"
+ " </Object>\n"
+ "</List>";
Document doc = Jsoup.parse(input, "", Parser.xmlParser()); // XML-Parser!
doc.select("p").unwrap(); // unwrapes all p-tags
doc.select("br").unwrap(); // uńwraps all br-tags
Also it's better to use a XML-Parser instead of a HTML-Parser here.
Output:
<?xml version="1.0" encoding="UTF-8" ?>
<list>
<object>
<section>
Fruit
</section>
<category>
Bananas
</category>
<brand>
Chiquita
</brand>
<obs>
Vende-se a peças ou o conjunto.
</obs> </object>
</list>
Upvotes: 2
Reputation: 11396
tags are lowercased, use:
whitelist.addTags(new String[] { "?xml", "list", "object", "section",
"category", "brand", "obs" });
output:
<list>
<object>
<section>
Fruit
</section>
<category>
Bananas
</category>
<brand>
Chiquita
</brand>
<obs>
Vende-se a peças ou o conjunto.
</obs></object>
</list>
Upvotes: 4