Achaius
Achaius

Reputation: 6124

How to find the given string is a RSS feed or not

I have a string which takes both XML and HTML input from a data downloaded from the given Url. I want to check whether the downloaded string is an rss feed of a html document before parsing through SAXParser. How to find this?

For example

If I download a data from http://rss.cnn.com/rss/edition.rss the resulting string is a rss feed

If I download a data from http://edition.cnn.com/2014/06/19/opinion/iraq-neocons-wearing/index.html the resulting string is a html document.

I want to continue my process if only the string is an rss feed.

Upvotes: 0

Views: 1011

Answers (1)

mkrakhin
mkrakhin

Reputation: 3486

RSS and HTML are both subsets of XML. So you can obtain your data as XML and validate it against RSS XSD. Like this.

URL schemaFile = new URL("http://europa.eu/rapid/conf/RSS20.xsd");
Source xmlFile = new StreamSource(YOUR_URL_HERE);
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
try {
  validator.validate(xmlFile);
  // at this line you can be sure it's RSS 2.0 stream
} catch (SAXException e) {
  // NOT RSS
}

If you want to check namely String, you can check it for typical rss structure, like root element, required element in . But I won't recommend it.

Upvotes: 1

Related Questions