basvas
basvas

Reputation: 363

How do I verify that a string is in English?

I read a string from the console. How do I make sure it only contains English characters and digits?

Upvotes: 31

Views: 47990

Answers (13)

Sina Karvandi
Sina Karvandi

Reputation: 1102

One other way is to check if IsLower and IsUpper both doesn't return true. Something like:

private bool IsAllCharEnglish(string Input)
{
 foreach (var item in Input.ToCharArray())
 {
  if (!char.IsLower(item) && !char.IsUpper(item) && !char.IsDigit(item) && !char.IsWhiteSpace(item))
  {
      return false;
  }
 }
 return true;
}

And for use it :

string str = "فارسی abc";
IsAllCharEnglish(str); // return false

str = "These are english 123";
IsAllCharEnglish(str); // return true

Upvotes: 5

Ramil Shavaleev
Ramil Shavaleev

Reputation: 402

Do not use RegEx and LINQ they are slower than the loop by characters of string

Performance test

My solution:

private static bool is_only_eng_letters_and_digits(string str)
{
   foreach (char ch in str)
   {
      if (!(ch >= 'A' && ch <= 'Z') && !(ch >= 'a' && ch <= 'z') && !(ch >= '0' && ch <= '9'))
      {
         return false;
      }
   }
   return true;
}

Upvotes: 3

Farzin Kanzi
Farzin Kanzi

Reputation: 3435

The accepted answer does not work for the white spaces or punctuation. Below code is tested for this input:

Hello: 1. - a; b/c \ _(5)??
(Is English)

Regex regex = new Regex("^[a-zA-Z0-9. -_?]*$");


string text1 = "سلام";
bool fls = regex.IsMatch(text1);   //false

string text2 = "123 abc! ?? -_)(/\\;:";
bool tru = regex.IsMatch(text2);  //true

Upvotes: 11

Manvendra Rajpurohit
Manvendra Rajpurohit

Reputation: 307

<?php
    $string="हिन्दी";
    $string="Manvendra Rajpurohit";
    echo strlen($string); echo '<br>';
    echo mb_strlen($string, 'utf-8');
    echo '<br>';
    if(strlen($string) != mb_strlen($string, 'utf-8'))
    { 
        echo "Please enter English words only:(";
    }
    else {
        echo "OK, English Detected!";
    }
?>

Upvotes: 0

Ivan I
Ivan I

Reputation: 9990

As many pointed out, accepted answer works only if there is a single word in the string. As there are no answers that cover the case of multiple words or even sentences in the string, here is the code:

stringToCheck.Any(x=> char.IsLetter(x) && !((int)x >= 63 && (int)x <= 126));

Upvotes: 0

LBushkin
LBushkin

Reputation: 131806

Assuming that by "English characters" you are simply referring to the 26-character Latin alphabet, this would be an area where I would use regular expressions: ^[a-zA-Z0-9 ]*$

For example:

if( Regex.IsMatch(Console.ReadLine(), "^[a-zA-Z0-9]*$") )
{ /* your code */ }

The benefit of regular expressions in this case is that all you really care about is whether or not a string matches a pattern - this is one where regular expressions work wonderfully. It clearly captures your intent, and it's easy to extend if you definition of "English characters" expands beyond just the 26 alphabetic ones.

There's a decent series of articles here that teach more about regular expressions.

Jørn Schou-Rode's answer provides a great explanation of how the regular expression presented here works to match your input.

Upvotes: 42

Danny A
Danny A

Reputation: 21

bool onlyEnglishCharacters = !EnglishText.Any(a => a > '~');

Seems cheap, but it worked for me, legit easy answer. Hope it helps anyone.

Upvotes: 1

Erik A. Brandstadmoen
Erik A. Brandstadmoen

Reputation: 10588

I agree with the Regular Expression answers. However, you could simplify it to just "^[\w]+$". \w is any "word character" (which translates to [a-zA-Z_0-9] if you use a non-unicode alphabet. I don't know if you want underscores as well.

More on regexes in .net here: http://msdn.microsoft.com/en-us/library/ms972966.aspx#regexnet_topic8

Upvotes: 0

Bhaskar
Bhaskar

Reputation: 10691

If i dont wnat to use RegEx, and just to provide an alternate solution, you can just check the ASCII code of each character and if it lies between that range, it would either be a english letter or a number (This might not be the best solution):

foreach (char ch in str.ToCharArray()) 
{ 
    int x = (int)char;
    if (x >= 63 and x <= 126) 
    {
       //this is english letter, i.e.- A, B, C, a, b, c...
    }
    else if(x >= 48 and x <= 57)
    {
       //this is number
    }
    else
    {
       //this is something diffrent
    }

} 

http://en.wikipedia.org/wiki/ASCII for full ASCII table.

But I still think, RegEx is the best solution.

Upvotes: 0

PurplePilot
PurplePilot

Reputation: 6612

do you have web access? i would assume that cannot be guaranteed, but Google has a language api that will detect the language you pass to it. google language api

Upvotes: 2

Andrii Shvydkyi
Andrii Shvydkyi

Reputation: 2286

Something like this (if you want to control input):

static string ReadLettersAndDigits() {
    StringBuilder sb = new StringBuilder();
    ConsoleKeyInfo keyInfo;
    while ((keyInfo = Console.ReadKey(true)).Key != ConsoleKey.Enter) {
        char c = char.ToLower(keyInfo.KeyChar);
        if (('a' <= c && c <= 'z') || char.IsDigit(c)) {
            sb.Append(keyInfo.KeyChar);
            Console.Write(c);
        }
    }
    return sb.ToString();
}

Upvotes: 0

J&#248;rn Schou-Rode
J&#248;rn Schou-Rode

Reputation: 38366

You could match it against this regular expression: ^[a-zA-Z0-9]*$

  • ^ matches the start of the string (ie no characters are allowed before this point)
  • [a-zA-Z0-9] matches any letter from a-z in lower or upper case, as well as digits 0-9
  • * lets the previous match repeat zero or more times
  • $ matches the end of the string (ie no characters are allowed after this point)

To use the expression in a C# program, you will need to import System.Text.RegularExpressions and do something like this in your code:

bool match = Regex.IsMatch(input, "^[a-zA-Z0-9]*$");

If you are going to test a lot of lines against the pattern, you might want to compile the expression:

Regex pattern = new Regex("^[a-zA-Z0-9]*$", RegexOptions.Compiled);

for (int i = 0; i < 1000; i++)
{
    string input = Console.ReadLine();
    pattern.IsMatch(input);
}

Upvotes: 20

James Curran
James Curran

Reputation: 103555

bool AllAscii(string str)
{ 
   return !str.Any(c => !Char.IsLetterOrDigit(c));
}

Upvotes: 0

Related Questions