Yash
Yash

Reputation: 3114

awk command to read a key value pair from a file

I have a file input.txt which stores information in KEY:VALUE form. I'm trying to read GOOGLE_URL from this input.txt which prints only http because the seperator is :. What is the problem with my grep command and how should I print the entire URL.

SCRIPT

$> cat script.sh
#!/bin/bash
URL=`grep -e '\bGOOGLE_URL\b' input.txt | awk -F: '{print $2}'`
printf " $URL \n"

INPUT_FILE

$> cat input.txt
GOOGLE_URL:https://www.google.com/

OUTPUT

https

DESIRED_OUTPUT

https://www.google.com/

Upvotes: 0

Views: 2900

Answers (5)

Ed Morton
Ed Morton

Reputation: 203393

Take your pick:

$ sed -n 's/^GOOGLE_URL://p' file
https://www.google.com/

$ awk 'sub(/^GOOGLE_URL:/,"")' file
https://www.google.com/

The above will work using any sed or awk in any shell on every UNIX box.

Upvotes: 1

αғsнιη
αғsнιη

Reputation: 2761

Yet another awk alternative:

gawk -F'(^[^:]*:)' '/^GOOGLE_URL:/{ print $2 }' infile

Upvotes: 0

Daweo
Daweo

Reputation: 36390

I would use GNU AWK following way for that task: Let file.txt content be:

EXAMPLE_URL:http://www.example.com/
GOOGLE_URL:https://www.google.com/
KEY:GOOGLE_URL:

Then:

awk 'BEGIN{FS="^GOOGLE_URL:"}{if(NF==2){print $2}}' file.txt

will output:

https://www.google.com/

Explanation: GNU AWK FS might be pattern, so I set it to GOOGLE_URL: anchored (^) to begin of line, so GOOGLE_URL: in middle/end will not be seperator (consider 3rd line of input). With this FS there might be either 1 or 2 fields in each line - latter is case only if line starts with GOOGLE_URL: so I check number of fields (NF) and if this is second case I print 2nd field ($2) as first record in this case is empty.

(tested in gawk 4.2.1)

Upvotes: 0

anubhava
anubhava

Reputation: 785108

Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.

This awk should work for you:

awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, "");  print}' input.txt

https://www.google.com/

Or this non-regex awk approach that allows you to pass key name from command line:

awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt

Or using gnu-grep:

grep -oP '^GOOGLE_URL:\K.+' input.txt

https://www.google.com/

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133468

Could you please try following, written and tested with shown samples in GNU awk. This will look for string GOOGLE_URL and will catch further either http or https value from url, in case you need only https then change http[s]? to https in following solution please.

awk '/^GOOGLE_URL:/{match($0,/http[s]?:\/\/.*/);print substr($0,RSTART,RLENGTH)}' Input_file

Explanation: Adding detailed explanation for above.

awk '                               ##Starting awk program from here.
/^GOOGLE_URL:/{                     ##Checking condition if line starts from GOOGLE_URL: then do following.
  match($0,/http[s]?:\/\/.*/)       ##Using match function to match http[s](s optional) : till last of line here.
  print substr($0,RSTART,RLENGTH)   ##Printing sub string of matched value from above function.
}
' Input_file                        ##Mentioning Input_file name here.


2nd solution: In case you need anything coming after first : then try following.

awk '/^GOOGLE_URL:/{match($0,/:.*/);print substr($0,RSTART+1,RLENGTH-1)}' Input_file

Upvotes: 1

Related Questions