Reputation: 1486
I have list of Nationalities of people against their entries, most of entries are properly given, but some of the entries are given as below; Proper ones are like below;
German
Iranian
Qatar
Improper are like below;
Possibly Ethiopian
Lebanon citizenship
DRC and Belgian nationalities
(1) Germany (b) Algeria
(a) Russian (b) Georgia
a) French, b) Tunisian
Indonesian (as at December 2003)
Iranian (Iranian citizenship)
Sudanese by birth
(1) Russian (2) USSR (until 1991)
Bahrain (citizenship revoked in January 2015)
United States of America. Also believed to hold Syrian nationality
Tunisian (dual nationality)
(1) German (2) Moroccan
1) Saudi Arabia 2) Qatar
a) Central African Republic b) South Sudan
Iranian national and US national/citizen
Kuwaiti citizenship withdrawn in 2002
I need to take out only bold text (Nationalities) from given text. Nationality can be of any country, these are just samples of some countries.
How would I apply regex or any type conditions which give results as expected. I have tried to check if text contains such characters then split them. for which I need to create more that 20 conditions and which also not good approach to do this.
List<string> listOfNationalities = listOfNationalities;
List<string> multiple new List<string>();
foreach (var nationality in listOfNationalities)
{
if(nationality.Contains("(1)"))
{
string[] nat = nationality.Split(')');
foreach (var item in nat)
{
multiple.Add(item);
}
}
}
Upvotes: 0
Views: 633
Reputation: 376
If the nationalities is provided by a fixed list of available options. You can do the following:
List<string> listOfNationalities = listOfNationalities;
List<string> validNationalities = new List<string>();
validNationalities.Add("Brazilian");
validNationalities.Add("Japanese");
validNationalities.Add("...");
List<string> multiple = listOfNationalities.Where(n => validNationalities.Contains(n));
or even simpler:
string listOfNationalities = string.Join("|",listOfNationalities);
List<string> validNationalities = new List<string>();
validNationalities.Add("Brazilian");
validNationalities.Add("Japanese");
validNationalities.Add("...");
List<string> multiple = validNationalities.Where(n => listOfNationalities.Contains(n));
In this way, you will get the two nationalities given.
Upvotes: 2
Reputation: 19641
If you already have a list of valid nationalities, and if the nationalities don't include special characters, you can use something like the following to create the regex pattern at runtime:
public string NationalitiesPattern;
public string GetNationalitiesPattern()
{
List<string> listOfNationalities = // All valid nationalities.
string joinedNationalities = string.Join("|", listOfNationalities);
return $@"\b(?:{joinedNationalities})\b"; // "\b(?:German|Iranian|Qatar|etc)\b"
}
And then you can use it like this:
if (string.IsNullOrEmpty(NationalitiesPattern))
NationalitiesPattern = GetNationalitiesPattern();
MatchCollection matches = Regex.Matches(inputString, NationalitiesPattern);
foreach (Match m in matches)
Console.WriteLine(m.Value);
Upvotes: 0