Brennan
Brennan

Reputation: 11686

Formatting Twitter text (TweetText) with C#

Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;

namespace Acme.Mvc.Extensions
{

    public static class MvcExtensions
    {
        const string ScreenNamePattern = @"@([A-Za-z0-9\-_&;]+)";
        const string HashTagPattern = @"#([A-Za-z0-9\-_&;]+)";
        const string HyperLinkPattern = @"(http://\S+)\s?";

        public static string TweetText(this HtmlHelper helper, string text)
        {
            return FormatTweetText(text);
        }

        public static string FormatTweetText(string text)
        {
            string result = text;

            if (result.Contains("http://"))
            {
                var links = new List<string>();
                foreach (Match match in Regex.Matches(result, HyperLinkPattern))
                {
                    var url = match.Groups[1].Value;
                    if (!links.Contains(url))
                    {
                        links.Add(url);
                        result = result.Replace(url, String.Format("<a href=\"{0}\">{0}</a>", url));
                    }
                }
            }

            if (result.Contains("@"))
            {
                var names = new List<string>();
                foreach (Match match in Regex.Matches(result, ScreenNamePattern))
                {
                    var screenName = match.Groups[1].Value;
                    if (!names.Contains(screenName))
                    {
                        names.Add(screenName);
                        result = result.Replace("@" + screenName,
                           String.Format("<a href=\"http://twitter.com/{0}\">@{0}</a>", screenName));
                    }
                }
            }

            if (result.Contains("#"))
            {
                var names = new List<string>();
                foreach (Match match in Regex.Matches(result, HashTagPattern))
                {
                    var hashTag = match.Groups[1].Value;
                    if (!names.Contains(hashTag))
                    {
                        names.Add(hashTag);
                        result = result.Replace("#" + hashTag,
                           String.Format("<a href=\"http://twitter.com/search?q={0}\">#{1}</a>",
                           HttpUtility.UrlEncode("#" + hashTag), hashTag));
                    }
                }
            }

            return result;
        }

    }

}

Upvotes: 12

Views: 4731

Answers (3)

Steven de Salas
Steven de Salas

Reputation: 21467

There is a good resource for parsing Twitter messages this link, worked for me:

How to Parse Twitter Usernames, Hashtags and URLs in C# 3.0

http://jes.al/2009/05/how-to-parse-twitter-usernames-hashtags-and-urls-in-c-30/

It contains support for:

  • Urls
  • #hashtags
  • @usernames

BTW: Regex in the ParseURL() method needs reviewing, it parses stock symbols (BARC.L) into links.

Upvotes: 0

Filix Mogilevsky
Filix Mogilevsky

Reputation: 777

I created helper method to shorten text to 140 chars with url included. You can set share length to 0 to exclude url from tweet.

 public static string FormatTwitterText(this string text, string shareurl)
    {
        if (string.IsNullOrEmpty(text))
            return string.Empty;

        string finaltext = string.Empty;
        string sharepath = string.Format("http://url.com/{0}", shareurl);

        //list of all words, trimmed and new space removed
        List<string> textlist = text.Split(' ').Select(txt => Regex.Replace(txt, @"\n", "").Trim())
                              .Where(formatedtxt => !string.IsNullOrEmpty(formatedtxt))
                              .ToList();

        int extraChars = 3; //to account for the two dots ".."
        int finalLength = 140 - sharepath.Length - extraChars;
        int runningLengthCount = 0;
        int collectionCount = textlist.Count;
        int count = 0;
        foreach (string eachwordformated in textlist
                .Select(eachword => string.Format("{0} ", eachword)))
        {
            count++;
            int textlength = eachwordformated.Length;
            runningLengthCount += textlength;
            int nextcount = count + 1;

            var nextTextlength = nextcount < collectionCount ? 
                                             textlist[nextcount].Length : 
                                             0;

            if (runningLengthCount + nextTextlength < finalLength)
                finaltext += eachwordformated;
        }

        return runningLengthCount > finalLength ? finaltext.Trim() + ".." : finaltext.Trim();
    }

Upvotes: 0

Rex M
Rex M

Reputation: 144112

That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are

1) looking up @name and replacing it with <a href="http://twitter.com/name">Real Name</a>;

2) multiple @name's in a row get commas, if they don't have them;

3) Tweets that start with @name(s) are formatted "To @name:".

I don't see any reason this can't be an effective way to parse a tweet - they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.

Edit:

Here is the code for my Tweet parser. It's a bit too long to put in a Stack Overflow answer. It takes a tweet like:

@user1 @user2 check out this cool link I got from @user3: http://url.com/page.htm#anchor #coollinks

And turns it into:

<span class="salutation">
    To <a href="http://twitter.com/user1">Real Name</a>,
    <a href="http://twitter.com/user2">Real Name</a>:
</span> check out this cool link I got from
<span class="salutation">
    <a href="http://www.twitter.com/user3">Real Name</a>
</span>:
<a href="http://site.com/page.htm#anchor">http://site.com/...</a>
<a href="http://twitter.com/#search?q=%23coollinks">#coollinks</a>

It also wraps all that markup in a little JavaScript:

document.getElementById('twitter').innerHTML = '{markup}';

This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won't affect my site's page load time.

Upvotes: 3

Related Questions