Kalpana Pinninty
Kalpana Pinninty

Reputation: 59

How to format CSV file using awk commands (Trying to avoid manual work around in the csv file) with out doing delimiter in the CSV file

I am very new to AWK ,Thanks for your suggestions in advance.

I have the question here, I am getting the output as shown below in a single row instead of multiple rows

hostname port
http://example.com/token                                                       80
https://digits.com                                                                  443
https://examples.demo.com?grant_type            443
http://demo/paying/security/tokens/demoitexample.com                                   80
http://demo/paying/security/tokens/demoitexample1.com                                  80
http://demo/paying/security/tokens/demoitexample2.com                                  80
http://demo/paying/security/tokens/demoitexample2.com                                  80

But i would like to get the output as below without doing any manual changes in csv file

expected output as shown below

hostname,port
http://example.com/token,80
https://digits.com,443
https://examples.demo.com?grant_type,443
http://demo/paying/security/tokens/demoitexample.com,80
http://demo/paying/security/tokens/demoitexample1.com,80
http://demo/paying/security/tokens/demoitexample2.com,80
http://demo/paying/security/tokens/demoitexample2.com,80

Here is the code where i am getting the output and that would be great if we can combine both commands in the single command.

grep -P '((?<=[^0-9.]|^)[1-9][0-9]{0,2}(\.([0-9]{0,3})){3}(?=[^0-9.]|$)|(http|ftp|https|ftps|sftp)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/+#-]*[\w@?^=%&/+#-])?|\.port|\.host|contact-points|\.uri|\.endpoint)' abc.properties| grep '^[^#]'| awk '{split($0,a,"#"); print a[1]}' | awk '{split($0,a,"="); print a[1],a[2]}'|sed 's/^\|#/,/g'|awk '/http:\/\//  {print $2,80}
       /https:\/\// {print $2,443}
       /Points/     {print $2,"9042"}
       /host/       {h=$2}
       /port/       {print h,$2; h=""}'|awk -F'[, ]' '{for(i=1;i<NF;i++){print $i,$NF}}'| column -t

Upvotes: 0

Views: 64

Answers (2)

Wiimm
Wiimm

Reputation: 3517

The solution of RavinderSingh13 have a pitfall: If the URL contains a comma, the output CSV is broken. URLs can't contain a space (or BLANK) by defintion, so a spaces is a save field separator here.

Examples for critical input records:

http://demo/paying/security/tokens/demoitexample1.com?a=b,c  80
http://demo/paying/security/tokens,more/demoitexample2.com   80

A solution is this awk command:

awk '{ gsub(/"/,"\\\""); printf("\"%s\",\"%s\"\n",$1,$2)}' Input_file

Here the URL and PORT is quoted. The quoting of the port is not necessary if you can guarantee that it is an easy word. Additionally, all " chars (invalid in URL, but rarely seen) are replaced by \". So a " in URL does not break the output. I never seen a backslash in URL, so I didn't replaced it by gsub(). Anyway, this does it: gsub(/["\\\\]/,"\\\\&"); (the 4x backslash \\\\ is reduced to \\ by bash).

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples could you please try following.

awk 'BEGIN{OFS=","} {$1=$1} 1' Input_file

OR if you have only 2 fields to deal with then try as per @Rafaf comment:

awk '{print $1","$2}' Input_file
OR
awk 'BEGIN{OFS=","} {print $1,$2}' Input_file

Upvotes: 2

Related Questions