superlogical
superlogical

Reputation: 14950

C# What is the best way to compute a hash of an xml feed

I want to detect if a feed has changed, the only way I can think of would be to hash the contents of the xml document and compare that to the last hash of the feed.

I am using XmlReader because SyndicationFeed uses it, so idealy I don't want to load the syndication feed unless the feed has been updated.

XmlReader reader = XmlReader.Create("http://www.extremetech.com/feed");
SyndicationFeed feed = SyndicationFeed.Load(reader);

Upvotes: 3

Views: 667

Answers (3)

Maghis
Maghis

Reputation: 1103

If you really want to go the hash way you can do the following:

var client = new WebClient();

var content = client.DownloadData("http://www.extremetech.com/feed");

var hash = MD5.Create().ComputeHash(content);
var hashString = Convert.ToBase64String(hash);

// you can then compare hashes and if changed load it this way
XmlReader reader = XmlReader.Create(new MemoryStream(content));

Of course going this way you will detect any change in the content, even the slightest.

IMHO the best way to go is load the feed anyway and hash just the contents of the articles, you can hash any string like this:

var toHash = "string to hash";

var hash = MD5.Create().ComputeHash(Encoding.UTF8.GetBytes(toHash);
var hashString = Convert.ToBase64String(hash);

Hope this helps.

Upvotes: 3

MerickOWA
MerickOWA

Reputation: 7602

A hash approach won't work in this case due to an XML comment added by some server side caching which constantly very frequently even when the actual feed never changes.

One thing you can do which works for this feed is use HTTP conditional requests to ask the server to give you the data only if its actually been modified since the last time you requested.

For example:

You'd have a global/member variable to hold the last modified datetime from your feed

    var lastModified = DateTime.MinValue;

Then each time you'd make a request like the following

    var request = (HttpWebRequest)WebRequest.Create( "http://www.extremetech.com/feed" );
    request.IfModifiedSince = lastModified; 
    try {

      using ( var response = (HttpWebResponse)request.GetResponse() ) {

        lastModified  = response.LastModified;

        using ( var stream = response.GetResponseStream() ) {

          //*** parsing the stream
          var reader = XmlReader.Create( stream );
          SyndicationFeed feed = SyndicationFeed.Load( reader );
          }
        }
      }
    catch ( WebException e ) {
      var response = (HttpWebResponse)e.Response;
      if ( response.StatusCode != HttpStatusCode.NotModified )
        throw; // rethrow an unexpected web exception
      }

Upvotes: 2

Paul Sasik
Paul Sasik

Reputation: 81509

Why not just check the LastUpdatedTime of the feed? That's a built-in way of telling you whether something is new or not. Instead of hashing and storing a hash you would simply keep track of the LastUpdatedTime and compare it periodically to latest LastUpdatedTime:

using System;
using System.ServiceModel.Syndication;
using System.Xml;

public class MyClass
{
    private static DateTime _lastFeedTime = new DateTime(2011, 10, 10);

    public static void Main()
    {
        XmlReader reader = XmlReader.Create("http://www.extremetech.com/feed");
        SyndicationFeed feed = SyndicationFeed.Load(reader);

        if (feed.LastUpdatedTime.LocalDateTime > _lastFeedTime)
        {
            _lastFeedTime = feed.LastUpdatedTime.LocalDateTime;

            // load feed...
        }
    }
}

Upvotes: 3

Related Questions