Reputation: 313

awk split on a different token

I am trying to initialize an array from a string split using awk. I am expecting the tokens be delimited by ",", but somehow they don't.

The input is a string returned by curl from the address http://www.omdbapi.com/?i=&t=the+campaign I've tried to remove any extra carriage return or things that could cause confusion, but in all clients I have checked it looks to be a single line string.

{"Title":"The Campaign","Year":"2012","Rated":"R", ...

and this is the ouput

    -metadata {"Title":"The **-metadata** Campaign","Year":"2012","Rated":"R","....

It should have been

   -metadata {"Title":"The Campaign"

Here's my piece of code:

__tokens=($(echo $omd_response | awk -F ',' '{print}'))
for i in "${__tokens[@]}"
  do
    echo "-metadata" $i"
done

Any help is welcome

Upvotes: 0

Answers (2)

rici

Reputation: 241861

I would take seriously the comment by @cbuckley: Use a json-aware tool rather than trying to parse the line with simple string tools. Otherwise, your script will break if a quoted-string has an comma inside, for example.

At any event, you don't need awk for this exercise, and it isn't helping you because the way awk breaks the string up is only of interest to awk. Once the string is printed to stdout, it is still the same string as always. If you want the shell to use , as a field delimiter, you have to tell the shell to do so.

Here's one way to do it:

(
  OLDIFS=$IFS
  IFS=,
  tokens=($omd_response)
  IFS=$OLDIFS

  for token in "${tokens[@]}"; do
    # something with token
  done
)

The ( and ) are just to execute all that in a subshell, making the shell variables temporaries. You can do it without.

Upvotes: 2

Olivier Dulac

Reputation: 3791

First, please accept my apologies: I don't have a recent bash at hand so I can't try the code below (no arrays!)

But it should work, or if not you should be able to tweak it to work (or ask underneath, providing a little context on what you see, and I'll help fix it)

nb_fields=$(echo "${omd_response}" | tr ',' '\n' | wc -l | awk '{ print $1 }')
  #The nb_fields will be correct UNLESS ${omd_response} contains a trailing "\", 
  #in which case it would be 1 too big, and below would create an empty 
  # __tokens[last_one], giving an extra `-metadata ""`. easily corrected if it happens.

#the code below assume there is at least 1 field... You should maybe check that.

#1) we create the __tokens[] array
for field in $( seq  1 $nb_fields )
do
   #optionnal: if field is 1 or $nb_fields, add processing to get rid of the { or } ?
   ${__tokens[$field]}=$(echo "${omd_response}" | cut -d ',' -f ${field})
done

#2) we use it to output what we want
for i in $( seq  1 $nb_fields )
do
   printf '-metadata "%s" '   "${__tokens[$i]}"
      #will output all on 1 line. 
      #You could add a \n just before the last ' so it goes each on different lines
done

so I loop on field numbers, instead of on what could be some space-or-tab separated values

Upvotes: 1

awk split on a different token

Answers (2)

Related Questions