Sugan Babu
Sugan Babu

Reputation: 3

How to delimit the fields of an input file which has special characters?

I am a newbie in shell scripting. I have a requirement to delimit the file fields of an input file having special characters and spaces with ";".

Input file:

----------------------------------
Server                    Port
----------------------------------
Local                      1001

-----------------------------------------
Name        Country        Count
-----------------------------------------
XXX          Bermuda        999

So my requirement is to get the output like,

Output :

Server;Port;Name;Country;Count
Local;1001;XXX;Bermuda;999

Please help me to fulfil the requirement. I prefer any tool, be it awk, sed, etc. and I don't want the dash lines

Upvotes: 0

Views: 98

Answers (3)

Jose Ricardo Bustos M.
Jose Ricardo Bustos M.

Reputation: 8164

Another solution, using sed and awk

sed -E '/^-/d;/^$/d;s/[[:space:]]+/;/g' file | 
awk '{d[NR%2]=(d[NR%2]?d[NR%2]";":"")$0}END{print d[1]; print d[0]}'

or awk only

awk '/^-/ || !NF{next}{
    gsub(/[[:space:]]+/,";")
    d[i%2]=(d[i%2]?d[i%2]";":"")$0
    ++i
}END{print d[0]; print d[1]}' file

you get,

Server;Port;Name;Country;Count
Local;1001;XXX;Bermuda;999

Edit: with input type john smith or Saudi Arabia

----------------------------------
Server                    Port
----------------------------------
Local                      1001

-----------------------------------------
Name        Country        Count
-----------------------------------------
john smith          Saudi Arabia        999

you can use [[:space:]][[:space:]]+ instead of [[:space:]]+

you get,

Server;Port;Name;Country;Count
Local;1001;john smith;Saudi Arabia;999

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Awk only. It assumes that delimiter is two or more spaces, so that multipart names such as Ber muda could be possible:

$ awk 'BEGIN{
         FS="  +";            # delimiter is two or more spaces
         OFS=";"              # output delimiter
     } 
     /^-*$/ { next }          # dashed or empty records are discarded
     {
         $1=$1;               # rebuild records to change delimiters
         if(/^Server|^Name/)  # gather header
             h=h $0 OFS; 
         else                 # gather data
             d=d $0 OFS
     } 
     END {                    # print header and data record
         print h; 
         print d
     }' file
Server;Port;Name;Country;Count;
Local;1001;XXX;Ber muda;999;

Downside is trailing OFS but that one could be removed with a couple of subs.

Upvotes: 0

dawg
dawg

Reputation: 103824

To get started,

  1. Remove lines that are blank or have ---- in them with sed. IF your files are all of the same layout as the example, that will result in header lines on odd lines and data lines on even.
  2. Entab the file with the POSIX utility unexpand This terns runs of spaces into tabs but leaves single spaces alone. (this is not necessary if the file is already TSV.)
  3. Use awk to now process the entabbed file into a header row and data row separated by ;

Demo:

sed -E '/^--*$|^$/d' file | unexpand -a | awk 'BEGIN{FS="\t"} 
                                                    NR%2 {for (i=1;i<=NF;i++) {
                                                                gsub(/^[ ]+/,"",$i)
                                                                h=h ? h ";" $i : $i ";" }
                                                                next} 
                                                          {for (i=1;i<=NF;i++) {
                                                                gsub(/^[ ]+/,"",$i)
                                                                b=b ? b ";" $i : $i ";" }
                                                                }       
                                                    END{print h
                                                        print b}'

Prints:

Server;;;;Port;Name;Country;Count
Local;;;;1001;XXX;Bermuda;999

This will support data fields with spaces, such as 'Saudi Arabia'

Upvotes: 0

Related Questions