Kiran
Kiran

Reputation: 5526

Removing line numbers with sed

I have a file with following format:

1         LOAD INTO TABLE
2             TBLNAME
3          (
4          FLDR_NUM                               POSITION(       1         )
5          INTEGER                                      ,
4          FLDR_NUM                               POSITION(       5         )
5          INTEGER                                      
6          )

I need to get rid of line numbers, read field info and build a json like structure. As first step, I am doing:

#!/bin/bash
count=1
while read line || [ -n "$line" ]
do
    name=$(sed -e 's/^[0-9][0-9]?\s*//' <<< $line)
    count=$[$count+1]
    # if [ $count -gt 3 ]
    # then
      echo "Name $name"
    # fi
done < $1

Here is what I am trying to achieve: Essentially remove until the first non-whitespace character after line numbers: Eg for line 4:

FLDR_NUM                               POSITION(       1         )

Updated the regex.

Upvotes: 1

Views: 561

Answers (4)

Renato Zannon
Renato Zannon

Reputation: 29941

Sed is thinking that you're using the ( metacharacter (a group). To use a literal (, you just need to stop escaping it:

sed -e 's/^[0-9][0-9]?\s*(*//'

It seems like this is the command you want: s/^[0-9][0-9]?\s*//

It will remove any one or two-digit numbers on the beginning of a line, followed by any number of spaces. If you want to get one-or-more numbers (instead of just one or two), change [0-9][0-9]? to [0-9]\+.

$ sed -e 's/^[0-9]\+\s*//' < example.txt

LOAD INTO TABLE
TBLNAME
(
FLDR_NUM                               POSITION(       1         )
INTEGER                                      ,
FLDR_NUM                               POSITION(       5         )
INTEGER                                      
)

EDIT: according to @ghoti, this isn't portable to every sed implementation

Upvotes: 1

Emmet
Emmet

Reputation: 6401

This isn't going to be easy in sed. I mean, getting rid of the leading numbers and whitespace is easy, but the rest of what you want to do is going to be tough.

I would be more likely to choose awk:

awk -F'[ )(]+' '
    NF==2 && /[A-Z]/    {print "{ " $2 " => "                   }  # TBLNAME
    NF==5 && $2!="LOAD" {fldr_num=$2; pos=$4                    }  # FLDR_NUM/POSITION
    NF==3               {print "\t" $2 "/" fldr_num "/" pos "," }  # INTEGER
    END                 {print "}"                              }  # Right brace
' infile.foo

It's not exactly what you're looking for, but it illustrates the basics of extracting the information you're interested in and reformatting/rearranging it.

Hope this helps.

Upvotes: 0

tripleee
tripleee

Reputation: 189317

If the line numbers are fixed width, just

cut -c11- file >file.new

If your final target is some sort of parsed JSON output, then whatever you are using to do the actual parsing is probably also very well equipped to skip the line numbers.

Upvotes: 0

ghoti
ghoti

Reputation: 46826

You can do this so many ways. One of the easiest might be just to use bash alone:

$ while read num line; do echo "$line"; done < inputfile

This works by considering each line as two variables separated by whitespace. The first works out to be the line number. The second is everything else.

A sed-based solution that is portable (i.e. not just for GNU sed) would look like this:

sed -e 's/^[0-9][0-9]*[[:space:]][[:space:]]*//' inputfile

Note that we use the BRE construct [[:space:]][[:space:]]* instead of the simpler ERE construct [[:space:]]+ because every version of sed understands BRE, whereas not every one understands ERE.

If there is a risk of whitespace before the numbers you want to strip, then you can insert [[:space:]]* after the ^ in the substitution's regex.

Upvotes: 3

Related Questions