Reputation: 8983
Lets say I have several short string:
string[] shortStrings = new string[] {"xxx","yyy","zzz"};
(this definition can change length on array and on string too, so not a fixed one)
When a given string, I like to check if it combines with the shortStrings ONLY, how?
let say function is like bool TestStringFromShortStrings(string s)
then
TestStringFromShortStrings("xxxyyyzzz") = true;
TestStringFromShortStrings("xxxyyyxxx") = true;
TestStringFromShortStrings("xxxyyy") = true;
TestStringFromShortStrings("xxxxxx") = true;
TestStringFromShortStrings("xxxxx") = false;
TestStringFromShortStrings("xxxXyyyzzz") = false;
TestStringFromShortStrings("xxx2yyyxxx") = false;
Please suggest a memory not tense and relatively fast method.
[EIDT] What this function for?
I will personally use this function to test if a string is a combination of a PINYIN ok, some Chinese stuff. Following Chinese are same thing if you cannot read it.
检测一个字符串是否为汉语拼音(例如检测是否拼音域名) 所有的汉语拼音字符串有:
(To detect whether a string is Hanyu Pinyin (e.g. detect the phonetic domain) of the Pinyin string:)
Regex PinYin = new Regex(@"^(a|ai|an|ang|ao|ba|bai|ban|bang|bao|bei|ben|beng|bi|bian|biao|bie|bin|bing|bo|bu|ca|cai|can|cang|cao|ce|cen|ceng|cha|chai|chan|chang|chao|che|chen|cheng|chi|chong|chou|chu|chua|chuai|chuan|chuang|chui|chun|chuo|ci|cong|cou|cu|cuan|cui|cun|cuo|da|dai|dan|dang|dao|de|den|dei|deng|di|dia|dian|diao|die|ding|diu|dong|dou|du|duan|dui|dun|duo|e|ei|en|eng|er|fa|fan|fang|fei|fen|feng|fo|fou|fu|ga|gai|gan|gang|gao|ge|gei|gen|geng|gong|gou|gu|gua|guai|guan|guang|gui|gun|guo|ha|hai|han|hang|hao|he|hei|hen|heng|hong|hou|hu|hua|huai|huan|huang|hui|hun|huo|ji|jia|jian|jiang|jiao|jie|jin|jing|jiong|jiu|ju|juan|jue|jun|ka|kai|kan|kang|kao|ke|ken|keng|kong|kou|ku|kua|kuai|kuan|kuang|kui|kun|kuo|la|lai|lan|lang|lao|le|lei|leng|li|lia|lian|liang|liao|lie|lin|ling|liu|long|lou|lu|lv|luan|lue|lve|lun|luo|ma|mai|man|mang|mao|me|mei|men|meng|mi|mian|miao|mie|min|ming|miu|mo|mou|mu|na|nai|nan|nang|nao|ne|nei|nen|neng|ni|nian|niang|niao|nie|nin|ning|niu|nong|nou|nu|nv|nuan|nuo|nun|ou|pa|pai|pan|pang|pao|pei|pen|peng|pi|pian|piao|pie|pin|ping|po|pou|pu|qi|qia|qian|qiang|qiao|qie|qin|qing|qiong|qiu|qu|quan|que|qun|ran|rang|rao|re|ren|reng|ri|rong|rou|ru|ruan|rui|run|ruo|sa|sai|san|sang|sao|se|sen|seng|sha|shai|shan|shang|shao|she|shei|shen|sheng|shi|shou|shu|shua|shuai|shuan|shuang|shui|shun|shuo|si|song|sou|su|suan|sui|sun|suo|ta|tai|tan|tang|tao|te|teng|ti|tian|tiao|tie|ting|tong|tou|tu|tuan|tui|tun|tuo|wa|wai|wan|wang|wei|wen|weng|wo|wu|xi|xia|xian|xiang|xiao|xie|xin|xing|xiong|xiu|xu|xuan|xue|xun|ya|yan|yang|yao|ye|yi|yin|ying|yo|yong|you|yu|yuan|yue|yun|za|zai|zan|zang|zao|ze|zei|zen|zeng|zha|zhai|zhan|zhang|zhao|zhe|zhei|zhen|zheng|zhi|zhong|zhou|zhu|zhua|zhuai|zhuan|zhuang|zhui|zhun|zhuo|zi|zong|zou|zu|zuan|zui|zun|zuo)+$");
用下面的正则表达式方法,试过了,最简单而且效果非常好,就是有点慢:( 递归的方式对长字符串比较麻烦,容易内存溢出
(Tried it with the regular expression: it's the most simple and gives very good results, but it's a bit slow. The recursive way on the long string is too much trouble, it's too easy to overflow the stack.)
Upvotes: 1
Views: 553
Reputation: 50326
You could compare the start of the input string with each of the short strings. As soon as you have a match, you take the rest of the string and repeat. As soon as you have no more string left, you're done. For example:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
if (input.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (input.StartsWith(shortStr))
{
if (Test(input.Substring(shortStr.Length)))
return true;
}
}
return false;
}
You might optimize this by removing the recursion, or by sorting the short strings and do a binary instead of a linear search.
Here is a non-recursive version, that uses a Stack
object instead. No chance of getting a StackOverflowException
:
string[] shortStrings = new string[] { "xxx", "yyy", "zzz" };
bool Test(string input)
{
Stack<string> stack = new Stack<string>();
stack.Push(input);
while (stack.Count > 0)
{
string str = stack.Pop();
if (str.Length == 0)
return true;
foreach (string shortStr in shortStrings)
{
if (str.StartsWith(shortStr))
stack.Push(str.Substring(shortStr.Length));
}
}
return false;
}
Upvotes: 2
Reputation: 2436
Edit: Simplified this a lot thanks to L.B and millimoose.
Regular Expressions to the rescue! Using System.Text.RegularExpressions.Regex
, we get:
public static bool TestStringFromShortStrings(string checkText, string[] pieces) {
// Build the expression. Ultimate result will be
// of the form "^(xxx|yyy|zzz)+$".
var expr = "^(" +
String.Join("|", pieces.Select(Regex.Escape)) +
")+$";
// Check whether the supplied string matches the expression.
return Regex.IsMatch(checkText, expr);
}
This should be able to properly handle cases that have multiple repeated patterns of different lenghts. E.g. if you the list of possible pieces includes strings "xxx"
and "xxxx"
.
Upvotes: 4
Reputation: 2554
Copy the target string to string builder. For each string in shortstring array, remove all occurences from target. If u end up in zero length string, true else false.
Edit: This approach is not correct. Please refer to comments. Keeping this answer still here as it may look reasonably correct initially.
Upvotes: 2