mark
mark

Reputation: 1335

How to fetch a particular string using a sed command

I have an input string like below:

   VAL:1|b:2|c:3|VAL:<har:[email protected]>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321

Like this, there are more than 1000 rows.

I want to fetch the value of the first parameter, i.e., the a field and d field, but for the d field I want only har:[email protected].

I tried like this:

cat $filename | grep -v Orig |sed -e 's/['a:','d:']//g' |awk -F'|' -v OFS=',' '{print $1 "," $4}' >> $NGW_DATA_FILE

The output I got is below:

1,<[email protected]>; tag=vy6r5BpcvQ

I want it like this,

1,har:[email protected]

Where did I make the mistake and how do I solve it?

Upvotes: 1

Views: 217

Answers (3)

anubhava
anubhava

Reputation: 786289

You may also try this AWK script:

cat file

VAL:1|b:2|c:3|VAL:<har:[email protected]>; tag=vy6r5BpcvQ|VAl:1234|name:mnp|VAL:91987654321

awk -F '[|;]' '{
   s=""
   for (i=1; i<=NF; ++i)
      if ($i ~ /^VAL:/) {
         gsub(/^[^:]+:|[<>]*/, "", $i)
         s = (s == "" ? "" : s "," ) $i
      }
   print s
}' file

1,har:[email protected]

Upvotes: 4

David C. Rankin
David C. Rankin

Reputation: 84652

You can do the same thing with sed rather easily using Extended Regex, two capture groups and two back-references, e.g.

sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'

Explanation

  • 's/find/replace/' standard substitution, where the find is;
  • ^[^:]*: from the beginning skip through the first ':', then
  • (\w+) capture one or more word characters ([a-zA-Z0-9_]), then
  • [^<]*[<] consume zero or more characters not a '<', then the '<', then
  • ([^>]+) capture everything not a '>', and
  • .*$ discard all remaining chars in line, then the replace is
  • \1,\2 reinsert the captured groups separated by a comma.

Example Use/Output

$ echo 'a:1|b:2|c:3|d:<har:[email protected]>; tag=vy6r5BpcvQ|' | 
sed -E 's/^[^:]*:(\w+)[^<]*[<]([^>]+).*$/\1,\2/'
1,har:[email protected]

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133770

EDIT: As per OP's change of Input_file and OP's comments, adding following now.

awk '
BEGIN{ FS="|"; OFS="," }
{
  sub(/[^:]*:/,"",$1)
  gsub(/^[^<]*|; .*/,"",$4)
  gsub(/^<|>$/,"",$4)
  print $1,$4
}'  Input_file


With shown samples, could you please try following, written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS="|"
  OFS=","
}
{
  val=""
  for(i=1;i<=NF;i++){
    split($i,arr,":")
    if(arr[1]=="a" || arr[1]=="d"){
      gsub(/^[^:]*:|; .*/,"",$i)
      gsub(/^<|>$/,"",$i)
      val=(val?val OFS:"")$i
    }
  }
  print val
}
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                ##Starting awk program from here.
BEGIN{                               ##Starting BEGIN section of this program from here.
  FS="|"                             ##Setting FS as pipe here.
  OFS=","                            ##Setting OFS as comma here.
}
{
  val=""                             ##Nullify val here(to avoid conflicts of its value later).
  for(i=1;i<=NF;i++){                ##Traversing through all fields here
    split($i,arr,":")                ##Splitting current field into arr with delimiter by :
    if(arr[1]=="a" || arr[1]=="d"){  ##Checking condition if first element of arr is either a OR d
      gsub(/^[^:]*:|; .*/,"",$i)     ##Globally substituting from starting till 1st occurrence of colon OR from semi colon to everything with NULL in $i.
      val=(val?val OFS:"")$i         ##Creating variable val which has current field value and keep adding in it.
    }
  }
  print val                          ##printing val here.
}
' Input_file                         ##Mentioning Input_file name here. 

Upvotes: 5

Related Questions