discorax
discorax

Reputation: 1487

get word frequency (count) as property in Linq Object using Regex

So I'm trying to take this post and tweak it for my own purposes but I can't figure out how.

Here is a starting Query:

     string input = sb.ToString();
            string[] keywords = new[] { "i","be", "with", "are", "there", "use", "still", "do","out", "so", "will", "but", "if", "can", "your", "what", "just", "from", "all", "get", "about", "this","t", "is","and", "the", "", "a", "to", "http" ,"you","my", "for", "in", "of", "ly" , "com", "it", "on","s", "that", "bit", "at", "have", "m", "rt",  "an", "was", "as", "ll", "not", "me" };
            Regex regex = new Regex("\\w+");
var stuff = regex.Matches(input)
                .OfType<Match>()
                .Select(c => c.Value.ToLowerInvariant())
                .Where(c => !keywords.Contains(c))
                .GroupBy(c => c)
                .OrderByDescending(c => c.Count())
                .ThenBy(c => c.Key);

But I would like to be able to get the COUNT (Frequency) of each Key value as well as the value itself so that I can store it in my database.

foreach (var item in stuff)
            {
                string query = String.Format("INSERT INTO sg_top_words (sg_word, sg_count) VALUES ('{0}','{1}')", item.Key, item.COUNT???);
                cmdIns = new SqlCommand(query, conn);
                cmdIns.CommandType = CommandType.Text;
                cmdIns.ExecuteNonQuery();
                cmdIns.Dispose();
            }

Thanks

Upvotes: 0

Views: 1682

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1500475

Assuming the query is nearly what you're after, this tweak should do it:

var stuff = regex.Matches(input)
    .Cast<Match>() // We're confident everything will be a Match!
    .Select(c => c.Value.ToLowerInvariant())
    .Where(c => !keywords.Contains(c))
    .GroupBy(c => c)
    .Select(g => new { Word = g.Key, Count = g.Count() })
    .OrderByDescending(g => g.Count)
    .ThenBy(g => g.Word);

Now the sequence will be of an anonymous type, with Key and Count properties.

Do you really need to order the results though, if you're just inserting them into a database? Could you just use this:

var stuff = regex.Matches(input)
    .Cast<Match>() // We're confident everything will be a Match!
    .Select(c => c.Value.ToLowerInvariant())
    .Where(c => !keywords.Contains(c))
    .GroupBy(c => c)
    .Select(g => new { Word = g.Key, Count = g.Count() });

Upvotes: 3

Related Questions