Reputation: 579
I have a string formed from names (w/o spaces) separated by periods. Each token (after a period) can start with a [a-zA-Z_]
or a [
(and ends with a ]
) or a $
(and ends with a $
).
Examples:
So I need to split the string by period, but in the last two examples I DONT want to split the 4.45
(or abc.def
). Anything surrounded by $
should not be splitted.
For the last two example I just want an array like that:
or
I have tried to use regex, but I'm completely wrong.
I was just informed that after the closing $
there could be another string surrounded by <
and >
which can again contain dots which I should not split:
House.Car.$abc.def$<ghi.jk>.[0].bla
And I need to get it like:
House
Car
$abc.def$<ghi.jk>
[0]
bla
Thanks for your help.
Upvotes: 1
Views: 232
Reputation: 124265
If not using regex is an option then you can write your own parser which will iterate one time over all characters in your string, checking if character is inside $...$
, [...]
or <...>
.
.
then you need to just add it to token you are building like any ordinary character,.
but it is inside previously mentioned "areas"..
and you are outside of these areas you need to split on it, which means adding currently build token to result and clearing it for next token.Such parser can look like this
public static List<String> parse(String input){
//list which will hold retuned tokens
List<String> tokens = new ArrayList<>();
// flags representing if currently tested character is inside some of
// special areas
// (at start we are outside of these areas so hey are set to false)
boolean insideDolar = false; // $...$
boolean insideSquareBrackets = false; // [...]
boolean insideAgleBrackets =false; // <...>
// we need some buffer to build tokens, StringBuilder is excellent here
StringBuilder sb = new StringBuilder();
// now lets iterate over all characters and decide if we need to add them
// to token or just add token to result list
for (char ch : input.toCharArray()){
// lets update in which area are we
// finding $ means that we either start or end `$...$` area so
// simple negation of flag is enough to update its status
if (ch == '$') insideDolar = !insideDolar;
//updating rest of flags seems pretty obvious
else if (ch == '[') insideSquareBrackets = true;
else if (ch == ']') insideSquareBrackets = false;
else if (ch == '<') insideAgleBrackets = true;
else if (ch == '>') insideAgleBrackets = false;
// So now we know in which area we are, so lets handle special cases
// if we are handling no dot
// OR we are handling dot but we are inside either of areas we need
// to just add it to token (append it to StringBuilder)
if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
sb.append(ch);
}else{// other case means that we are handling dot outside of special
// areas where dots are not separators, so now they represents place
// to split which means that we don't add it to token, but
// add value from buffer (current token) to results and reset buffer
// for next token
tokens.add(sb.toString());
sb.delete(0, sb.length());
}
}
// also since we only add value held in buffer to list of tokens when we
// find dot on which we split, there is high chance that we will not add
// last token to result, because there is no dot after it, so we need to
// do it manually after iterating over all characters
if (sb.length()>0)//non empty token needs to be added to result
tokens.add(sb.toString());
return tokens;
}
and you can use it like
String input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
System.out.println(s);
output:
House
Car2
$abc.def$<ghi.jk>
[0]
Upvotes: 1
Reputation: 121780
You are better off collecting the results by "walking" the string to match with .find()
:
// Note the alternation
private static final Pattern PATTERN
= Pattern.compile("\\$[^.$]+(\\.[^.$]+)*\\$|[^.]+");
//
public List<String> matchesForInput(final String input)
{
final Matcher m = PATTERN.matcher(input);
final List<String> matches = new ArrayList<>();
while (m.find())
matches.add(m.group());
return matches;
}
Upvotes: 2
Reputation: 71578
It will be easier with Pattern/Matcher I believe. Raw regex:
\$[^$]+\$|\[[^\]]+\]|[^.]+
In code:
String s = "House.Car2.$4.45$.[0]";
Pattern pattern = Pattern.compile("\\$[^$]+\\$|\\[[^\\]]+\\]|[^.]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
House
Car2
$4.45$
[0]
Upvotes: 1