keeplearning
keeplearning

Reputation: 379

How to exclude specific pattern while using sed

I have a file which is space separated, I need to convert it to pipe delimited but when it encounters specific patterns in each line, it needs to exclude those patterns while doing sed.

a char(30) NOT NULL
b LARGEINT NOT NULL
c TIMESTAMP
d numeric(10, 3)

Expected output

a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP|
d|numeric(10, 3)|

While replacing the space, need to ignore space in (10, 3) and NOT NULL. I tried the below but it didn't work

sed -ri '|, |!s|\ /\|/g' abc.txt

Any help in this regard is appreciated.

Upvotes: 1

Views: 13207

Answers (4)

Claes Wikner
Claes Wikner

Reputation: 1517

awk '/^[cd]/{$NF=$NF"|"}{sub(/ /,"|")sub(/ N/,"|N")}1' file

a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP|
d|numeric(10, 3)|

What happens? It starts with adding pipes at the end of the line beginning with c or d.

The first sub replace pipe in very first empty space area.

And the second sub put pipe in front of NOT.

Upvotes: 0

Sundeep
Sundeep

Reputation: 23667

If I understood the question correctly, these are requirements:

  • Convert input file into three column output with | as delimiter
  • The third field may be empty
  • Input is space delimited, but
    • Third field may contain spaces
    • Second field in input may contain text within () which may contain spaces

Following would work for given sample

$ cat ip.txt 
a char(30) NOT NULL
b LARGEINT NOT NULL
c TIMESTAMP
d numeric(10, 3)

$ sed -E 's/ +/|/; s/\) */)|/; /\)/!s/ +|$/|/' ip.txt 
a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP|
d|numeric(10, 3)|
  • s/ +/|/ change first occurrence of one or more spaces to |
  • s/\) */)|/ handle the troublesome second field first. Change ) and optional spaces to )|
    • of course, assumes, no other field contains ()
  • /\)/!s/ +|$/|/ for remaining lines, if it doesn't contain ), change first occurrence of one or more spaces or end of line to |

Upvotes: 2

James Brown
James Brown

Reputation: 37394

In GNU awk using FPAT:

$ awk '
BEGIN {
    FPAT="([^ ]+)|([^ ]*NOT NULL[^ ]*)|([^ ]*numeric([^)]*)[^ ]*)"  # set FPAT
    OFS="|"                                                         # set OFS
}
{ NF=3; $1=$1 }                                                     # rebuild record (1)
1' file                                                             # and output
a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP|
d|numeric(10, 3)|

(1) NF=3 sets field count statically to 3 and causes the pipes in the ends of the c and d records.

Upvotes: 3

David C. Rankin
David C. Rankin

Reputation: 84531

It's not particularly pretty, but a two expression sed expression will work,

$ sed -e 's/\([^,T]\)[ ]/\1\|/g' file.txt | sed -e 's/\([^O]T\)[ ]/\1\|/'
a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP
d|numeric(10, 3)

If you really want the pipe after TIMESTAMP and ), you can add a third to the end, e.g.

$ sed -e 's/\([^,T]\)[ ]/\1\|/g' file.txt | sed -e 's/\([^O]T\)[ ]/\1\|/' \
-e 's/\([^L]\)$/\1\|/'
a|char(30)|NOT NULL
b|LARGEINT|NOT NULL
c|TIMESTAMP|
d|numeric(10, 3)|

But I took the final pipes to be trailing spaces in your input file not present in mine. Either way, it's another way to skin-this-cat.

Upvotes: 1

Related Questions