Partha
Partha

Reputation: 2192

How to word by word iterate in string in C#?

I want to iterate over string as word by word.

If I have a string "incidentno and fintype or unitno", I would like to read every word one by one as "incidentno", "and", "fintype", "or", and "unitno".

Upvotes: 8

Views: 27500

Answers (10)

Maxim Zaslavsky
Maxim Zaslavsky

Reputation: 18065

There are multiple ways to accomplish this. Two of the most convenient methods (in my opinion) are:

  • Using string.Split() to create an array. I would probably use this method, because it is the most self-explanatory.

example:

string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');

Alternatively, you could use (this is what I would use):

string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);

StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that may occur due to extra whitespace and other minor problems.

Next - to process the words, you would use:

foreach (string word in seperatedWords)
{
//Do something
}
  • Or, you can use regular expressions to solve this problem, as Darin demonstrated (a copy is below).

example:

var regex = new Regex(@"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

For processing, you can use similar code to the first option.

foreach (string word in words)
{
//Do something
}

Of course, there are many ways to solve this problem, but I think that these two would be the simplest to implement and maintain. I would go with the first option (using string.Split()) just because regex can sometimes become quite confusing, while a split will function correctly most of the time.

Upvotes: 2

Jeroen Vorsselman
Jeroen Vorsselman

Reputation: 823

I'd like to add some information to JDunkerley's awnser.
You can easily make this method more reliable if you give a string or char parameter to search for.

public static IEnumerable<string> WordList(this string Text,string Word)
        {
            int cIndex = 0;
            int nIndex;
            while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
            {
                int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
                yield return Text.Substring(sIndex, nIndex - sIndex);
                cIndex = nIndex;
            }
            yield return Text.Substring(cIndex + 1);
        }

public static IEnumerable<string> WordList(this string Text, char c)
        {
            int cIndex = 0;
            int nIndex;
            while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
            {
                int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
                yield return Text.Substring(sIndex, nIndex - sIndex);
                cIndex = nIndex;
            }
            yield return Text.Substring(cIndex + 1);
        }

Upvotes: 0

singapore saravanan
singapore saravanan

Reputation: 21

public static string[] MyTest(string inword, string regstr)
{
    var regex = new Regex(regstr); 
    var phrase = "incidentno and fintype or unitno";
    var words = regex.Split(phrase);  
    return words;
}

? MyTest("incidentno, and .fintype- or; :unitno",@"[^\w+]")

[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"

Upvotes: 0

x19
x19

Reputation: 8783

I write a string processor class.You can use it.

Example:

metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();

Class:

 public static class StringProcessor
{
    private static List<String> PrepositionList;

    public static string ToNormalString(this string strText)
    {
        if (String.IsNullOrEmpty(strText)) return String.Empty;
        char chNormalKaf = (char)1603;
        char chNormalYah = (char)1610;
        char chNonNormalKaf = (char)1705;
        char chNonNormalYah = (char)1740;
        string result = strText.Replace(chNonNormalKaf, chNormalKaf);
        result = result.Replace(chNonNormalYah, chNormalYah);
        return result;
    }

    public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
        List<String> blackListWords = null,
        int minimumWordLength = 3,
        char splitor = ' ',
        bool perWordIsLowerCase = true)
    {
        string[] btArray = bodyText.ToNormalString().Split(splitor);
        long numberOfWords = btArray.LongLength;
        Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
        foreach (string word in btArray)
        {
            if (word != null)
            {
                string lowerWord = word;
                if (perWordIsLowerCase)
                    lowerWord = word.ToLower();
                var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
                    .Replace("?", "").Replace("!", "").Replace(",", "")
                    .Replace("<br>", "").Replace(":", "").Replace(";", "")
                    .Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
                if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
                {
                    if (wordsDic.ContainsKey(normalWord))
                    {
                        var cnt = wordsDic[normalWord];
                        wordsDic[normalWord] = ++cnt;
                    }
                    else
                    {
                        wordsDic.Add(normalWord, 1);
                    }
                }
            }
        }
        List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
        return keywords;
    }

    public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
    {
        List<KeyValuePair<String, Int32>> result = null;
        if (isBasedOnFrequency)
            result = list.OrderByDescending(q => q.Value).ToList();
        else
            result = list.OrderByDescending(q => q.Key).ToList();
        return result;
    }

    public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
    {
        List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
        return result;
    }

    public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
    {
        List<String> result = new List<String>();
        foreach (var item in list)
        {
            result.Add(item.Key);
        }
        return result;
    }

    public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
    {
        List<Int32> result = new List<Int32>();
        foreach (var item in list)
        {
            result.Add(item.Value);
        }
        return result;
    }

    public static String AsString<T>(this List<T> list, string seprator = ", ")
    {
        String result = string.Empty;
        foreach (var item in list)
        {
            result += string.Format("{0}{1}", item, seprator);
        }
        return result;
    }

    private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
    {
        bool result = false;
        if (blackListWords == null) return false;
        foreach (var w in blackListWords)
        {
            if (w.ToNormalString().Equals(word))
            {
                result = true;
                break;
            }
        }
        return result;
    }
}

Upvotes: -1

ParmesanCodice
ParmesanCodice

Reputation: 5035

When using split, what about checking for empty entries?

string sentence =  "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}

EDIT:

I can't comment so I'm posting here but this (posted above) works:

foreach (string word in "incidentno and fintype or unitno".Split(' ')) 
{
   ...
}

My understanding of foreach is that it first does a GetEnumerator() and the calles .MoveNext until false is returned. So the .Split won't be re-evaluated on each iteration

Upvotes: 1

JDunkerley
JDunkerley

Reputation: 12505

Slightly twisted I know, but you could define an iterator block as an extension method on strings. e.g.

    /// <summary>
    /// Sweep over text
    /// </summary>
    /// <param name="Text"></param>
    /// <returns></returns>
    public static IEnumerable<string> WordList(this string Text)
    {
        int cIndex = 0;
        int nIndex;
        while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
        {
            int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
            yield return Text.Substring(sIndex, nIndex - sIndex);
            cIndex = nIndex;
        }
        yield return Text.Substring(cIndex + 1);
    }

        foreach (string word in "incidentno and fintype or unitno".WordList())
            System.Console.WriteLine("'" + word + "'");

Which has the advantage of not creating a big array for long strings.

Upvotes: 13

Darin Dimitrov
Darin Dimitrov

Reputation: 1039468

var regex = new Regex(@"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

This works even if you have ".,; tabs and new lines" between your words.

Upvotes: 13

Mike Cooper
Mike Cooper

Reputation: 3058

Use the Split method of the string class

string[] words = "incidentno and fintype or unitno".Split(" ");

This will split on spaces, so "words" will have [incidentno,and,fintype,or,unitno].

Upvotes: 5

bbohac
bbohac

Reputation: 351

Assuming the words are always separated by a blank, you could use String.Split() to get an Array of your words.

Upvotes: 3

Guffa
Guffa

Reputation: 700810

foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
   ...
}

Upvotes: 20

Related Questions