McQuack
McQuack

Reputation: 423

Sed string replace into file

Using Sed because of large files, I'd like to match strings of form

'09/07/15 16:56:36,333000000','DD/MM/RR HH24:MI:SSXFF'

and replace it by

'09/07/15 16:56:36','DD/MM/RR HH24:MI:SS'

Checked by regex tester this regex seems to match
'\d{2}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2},\d{9}','DD\/MM\/RR HH24:MI:SSXFF'

but when I do

sed -ie "s#\(\x27\d{2}\/\d{2}\/\d{2}\s\d{2}:\d{2}:\d{2}\),\d{9}  
\(\x27,\x27DD\/MM\/RR HH24:MI:SS\)XFF\x27#\1\2\x27#g" inputfile  

it does not replace anything. What am I missing ?

Upvotes: 2

Views: 93

Answers (2)

werkritter
werkritter

Reputation: 1679

NOTE: in the answer below I describe why your expression doesn't work in general. I would strongly suggest that you try to simplify your expression as much as possible first, or use @StevenPenny's excellent answer, because:

  • applying the changes described below in your present expression would turn it into a hulking, unmaintainable regex nightmare;
  • my remarks may not be exhaustive — they point out the cause, some of the particular problems, and sources for further investigation.

The problem is that sed and http://regexr.com/ regex engines are somewhat different. See the "RegEx engine" section on the website:

While the core feature set of regular expressions is fairly consistent, different implementations (ex. Perl vs Java) may have different features or behaviours.

RegExr uses your browser's RegExp engine for matching, and its syntax highlighting and documentation reflect the JavaScript RegExp standard.

Whereas the latest versions of GNU sed is mostly compatible with POSIX.2 Basic Regular Expressions (BREs). See the excerpt from the sed(1) manpage for GNU sed, version 4.2.2:

REGULAR EXPRESSIONS

POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.

The descriptions of POSIX regex languages (that is BRE — Basic Regular Expressions and ERE — Extended Regular Expressions) are in the regex(7) manpage.

In particular, concerning your expression:

  • Character class notation is different: for example, for digits you're using \d, while in BRE you should write [[:digit:]]; for white space, you're using \s, whereas in BRE there's [[:space:]].
  • Some characters have to be prepended with backslash in order to escape their literal meaning. That concerns {, which in BRE should be \{.

Upvotes: 0

Zombo
Zombo

Reputation: 1

Why not just use something like this?

#!/usr/bin/sed -f
s/,[[:digit:]]*//
s/XFF//

Upvotes: 2

Related Questions