Sathi Chowdhury
Sathi Chowdhury

Reputation: 11

Java regex: need one regex to match all the formats specified

A log file has these pattern appearing more than once in a line. for example the file may look like

dsads utc-hour_of_year:2013-07-30T17 jdshkdsjhf utc-week_of_year:2013-W31 dskjdskf
utc-week_of_year:2013-W31 dskdsld  fdsfd
dshdskhkds utc-month_of_year:2013-07 gfdkjlkdf

I want to replace all date specific info with "Y"

I tried : replaceAll("_year:.*\s", "_year:Y ");` but it removes everything that occurs after the first replacement,due to greedy match of .*

dsads utc-hour_of_year:Y
utc-week_of_year:Y
dshdskhkds utc-month_of_year:Y

but the expected result is:

dsads utc-hour_of_year:Y jdshkdsjhf utc-week_of_year:Y dskjdskf
utc-week_of_year:Y dskdsld  fdsfd
dshdskhkds utc-month_of_year:Y gfdkjlkdf

Upvotes: 0

Views: 104

Answers (2)

Pshemo
Pshemo

Reputation: 124215

I am not sure what you are really trying to do and this answer is only based on your example. In case you want to do something else leave comment below or edit your question with more specific information/example

It removes everything after _year: because you are using .*\\s which means

  • .* zero or more of any characters (beside new line),
  • \\s and space after it

so in sentence

utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf

it will match

utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf
//               ^from here                                to here^

because by default * quantifier is greedy. To make it reluctant you need to add ? after * so try maybe

  • "_year:.*?\\s"

or even better instead .*? match only non-space characters using \\S which is the same as negation of \\s that can be written as [^\\s]. Also if your data can be at the end of your input you shouldn't probably add \\s at the end of your regex and space in your replacement, so try maybe one of this ways

  • .replaceAll("_year:\\S*", "_year:Y")
  • .replaceAll("_year:\\S*\\s", "_year:Y ")

Upvotes: 1

arshajii
arshajii

Reputation: 129477

Try using a reluctant quantifier: _year:.*?\s.

.replaceAll("_year:.*?\\s", "_year:Y ")

System.out
        .println("utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf"
                .replaceAll("_year:.*?\\s", "_year:Y "));
utc-hour_of_year:Y dsfsdgfsgf utc-week_of_year:Y dsfsdgfsdgf

Upvotes: 1

Related Questions