user3292965
user3292965

Reputation: 79

regular expressions cut from the second pipe

Using java regular expressions basically to reduce a list of semicolon separated properties

2013-07-15 21:46:26|Dinner with James|Lucerne|MEDIATYPE;image|CATEGORY;25|365|423|IMGTOKEN;8adbfb5840349cac014052ded00f26da|TAGS;dinner|james|lucerne;

What I am trying to achieve is to:

  1. strip all characters after the next semicolon;
  2. cut the word before the semicolon (in this example MEDIATYPE);
  3. cut the pipe

Expected end result:

2013-07-15 21:46:26|Dinner with James|Lucerne

How could I do that with regular expressions?

Solved! Thanks! (.*?)(?=\|[^|;]+;) worked out for me

Upvotes: 0

Views: 266

Answers (3)

alpha bravo
alpha bravo

Reputation: 7948

Use this pattern:

(.*?)(?=\|[^|;]+;)

Example

Upvotes: 0

Jerry
Jerry

Reputation: 71538

You could perhaps use a replace, matching on the first pipe that is immediately followed by a semicolon. The raw regex string I suggest is:

\|(?=[^|;]*;).*

Which is this in Java string:

\\|(?=[^|;]*;).*

An example:

String text = "2013-07-15 21:46:26|Dinner with James|Lucerne|MEDIATYPE;image|CATEGORY;25|365|423|IMGTOKEN;8adbfb5840349cac014052ded00f26da|TAGS;dinner|james|lucerne;";
String result = text.replaceAll("\\|(?=[^|;]*;).*", "");
System.out.println("Result: " + result);

which should give you:

2013-07-15 21:46:26|Dinner with James|Lucerne

Breakdown:

\\|      Match a literal pipe
(?=      Begin positive lookahead
  [^|;]* Any character except pipe or semicolon
  ;      A semicolon
)        End positive lookahead
.*       Anything else on this line

The positive lookahead is ensuring that there is a semicolon right after the pipe where the 'cut' begins without any more pipe or semicolons in between.

Upvotes: 0

Boris the Spider
Boris the Spider

Reputation: 61128

So you want to split on the the pipe (|) before the semi-colon?

This pattern will work:

\\|(?=[^|]*;)

Explanation

  • \\| a literal pipe character. The double escape is Java syntax
  • (?=[^|]*;) this is a lookahead assertion that finds a semi-colon that follows the pipe. There are any number of non-pipe characters before the semi-colon.

Example:

public static void main(final String[] args) throws IOException {
    final String input = "2013-07-15 21:46:26|Dinner with James|Lucerne|MEDIATYPE;image|CATEGORY;25|365|423|IMGTOKEN;8adbfb5840349cac014052ded00f26da|TAGS;dinner|james|lucerne;";
    final String[] split = input.split("\\|(?=[^|]*;)");
    System.out.println(split[0]);
}

Output:

2013-07-15 21:46:26|Dinner with James|Lucerne

Upvotes: 1

Related Questions