YoungHobbit
YoungHobbit

Reputation: 13402

Pig: Regular Expression syntax

I am using string comparison using regular expression in pig script.

I know that regular expression in PIG are same as Java.

The problem I am facing is: I need to remove all the character which contain white space at the trailing end ?

My regular expression is this: (name matches '!\\s+$')

Sample Script-----

raw_data = load '$input' using PigStorage(',') as (fname:chararray);
filter_data = filter raw_data by (fname matches '!\\s+$');
dump filter_data;

Sample Input-----

abcd    ,123
pqrs,234
xyz ,234
lmn,2345

It is not writing anything on STDOUT, where as it should have written "pqrs" and "lmn".

Upvotes: 0

Views: 546

Answers (1)

zx81
zx81

Reputation: 41838

I don't know PIG, but in Java one syntactically-correct regex to match pqrs,234 and lmn,2345 and would be:

^\S+$

assuming you were in multiline mode.

  • In Java you escape backslashes, so that turns into ^\\S+$
  • In Java you can turn on multiline with (?m) so a regex could be (?m)^\\S+$

See demo.

Upvotes: 1

Related Questions