extract optional field values from lines

Question

I have text in the form of separate lines, where each line has CSV-like format:

SOME BUNCH OF TEXT, FIELD_A: 12, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656

The order of fields is always the same, but some fields may be absent. There can be other fields between fields of interest, for example comparing to the line above I can get the following as well:

SOME BUNCH OF TEXT, FIELD_A: 12, NOT_INTERESTED: 235, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656, FIELDS

As the result of processing this text I want to have clean CSV file with my fields specified one after another:

12,0.2321,12:10:08 2011/07/22,656

If some field is absent then I would like to simple omit value (for example FIELD_B was absent):

12,,12:10:08 2011/07/22,656

How can I do this using commands like sed, perl or awk ? I tried extracting single field with perl -pe 's/^.*?(FIELD_A: (.*?),)?.*?$/\2/' and failed - regex simply ignores my field even if it presents

dogbane · Accepted Answer

You can use awk with an associative array as shown below. Loop over the fields and split them on :. Then store the key-value pair into an associative array. Finally print out the fields you want.

awk -F, '{
 split("",arr)
 for(i=1; i<=NF; i++){
   a=index($i, ":")
   if(a != 0){
     # split on first colon to get key-value pair
     key=substr($i,1,a-1)
     val=substr($i,a+1)

     # remove leading spaces from key and value
     gsub(/^ */,"",key)
     gsub(/^ */,"",val)

     # store in an associative array
     arr[key]=val
   }   
 }
 # print out the desired fields
 print arr["FIELD_A"]","arr["FIELD_B"]","arr["FIELD_C"]","arr["FIELD_D"]
}' data.txt

extract optional field values from lines

Answers (2)

Related Questions