Reputation: 5526
I have a file with following format:
1 LOAD INTO TABLE
2 TBLNAME
3 (
4 FLDR_NUM POSITION( 1 )
5 INTEGER ,
4 FLDR_NUM POSITION( 5 )
5 INTEGER
6 )
I need to get rid of line numbers, read field info and build a json like structure. As first step, I am doing:
#!/bin/bash
count=1
while read line || [ -n "$line" ]
do
name=$(sed -e 's/^[0-9][0-9]?\s*//' <<< $line)
count=$[$count+1]
# if [ $count -gt 3 ]
# then
echo "Name $name"
# fi
done < $1
Here is what I am trying to achieve: Essentially remove until the first non-whitespace character after line numbers: Eg for line 4:
FLDR_NUM POSITION( 1 )
Updated the regex.
Upvotes: 1
Views: 561
Reputation: 29941
Sed is thinking that you're using the (
metacharacter (a group). To use a literal (
, you just need to stop escaping it:
sed -e 's/^[0-9][0-9]?\s*(*//'
It seems like this is the command you want: s/^[0-9][0-9]?\s*//
It will remove any one or two-digit numbers on the beginning of a line, followed by any number of spaces. If you want to get one-or-more numbers (instead of just one or two), change [0-9][0-9]?
to [0-9]\+
.
$ sed -e 's/^[0-9]\+\s*//' < example.txt
LOAD INTO TABLE
TBLNAME
(
FLDR_NUM POSITION( 1 )
INTEGER ,
FLDR_NUM POSITION( 5 )
INTEGER
)
EDIT: according to @ghoti, this isn't portable to every sed
implementation
Upvotes: 1
Reputation: 6401
This isn't going to be easy in sed
. I mean, getting rid of the leading numbers and whitespace is easy, but the rest of what you want to do is going to be tough.
I would be more likely to choose awk
:
awk -F'[ )(]+' '
NF==2 && /[A-Z]/ {print "{ " $2 " => " } # TBLNAME
NF==5 && $2!="LOAD" {fldr_num=$2; pos=$4 } # FLDR_NUM/POSITION
NF==3 {print "\t" $2 "/" fldr_num "/" pos "," } # INTEGER
END {print "}" } # Right brace
' infile.foo
It's not exactly what you're looking for, but it illustrates the basics of extracting the information you're interested in and reformatting/rearranging it.
Hope this helps.
Upvotes: 0
Reputation: 189317
If the line numbers are fixed width, just
cut -c11- file >file.new
If your final target is some sort of parsed JSON output, then whatever you are using to do the actual parsing is probably also very well equipped to skip the line numbers.
Upvotes: 0
Reputation: 46826
You can do this so many ways. One of the easiest might be just to use bash alone:
$ while read num line; do echo "$line"; done < inputfile
This works by considering each line as two variables separated by whitespace. The first works out to be the line number. The second is everything else.
A sed-based solution that is portable (i.e. not just for GNU sed) would look like this:
sed -e 's/^[0-9][0-9]*[[:space:]][[:space:]]*//' inputfile
Note that we use the BRE construct [[:space:]][[:space:]]*
instead of the simpler ERE construct [[:space:]]+
because every version of sed understands BRE, whereas not every one understands ERE.
If there is a risk of whitespace before the numbers you want to strip, then you can insert [[:space:]]*
after the ^
in the substitution's regex.
Upvotes: 3