MarvinLeRouge
MarvinLeRouge

Reputation: 534

Multiple curl in parallel with limitations

I have a json file, with entries containing urls (among other things), which i retrieve using curl. I'd like to be able to run the loop several times at once to go faster, but also to have a limitation of the number of parallel curls, to avoid being kicked out by the distant server. For now, my code is like

  jq -r '.entries[] | select(.enabled != false) | .id,.unitUrl' $fileIndexFeed | \
  while read unitId; do
    read -r unitUrl
    if ! in_array tabAnnoncesExistantesIds $unitId; then
      fullUnitUrl="$unitUrlBase$unitUrl"
      unitFile="$unitFileBase$unitId.json"
      if [ ! -f $unitFile ]; then
        curl -H "Authorization:$authMethod $encodedHeader" -X GET $fullUnitUrl -o $unitFile
      fi
    fi
   done

If i use a simple & at the end of my curl, it will run lots of concurrent requests, and i could be kicked. So, the question would be (i suppose) : how to know that a curl runned with an & has finished its job ? If i'm able to detect that, then i guess that i can test, increment and decrement a variable telling the number of running curls.

Thanks

Upvotes: 1

Views: 1244

Answers (2)

Ole Tange
Ole Tange

Reputation: 33685

Use a Bash function:

doit() {
  unitId="$1"
  unitUrl="$2"
  if ! in_array tabAnnoncesExistantesIds $unitId; then
    fullUnitUrl="$unitUrlBase$unitUrl"
    unitFile="$unitFileBase$unitId.json"
    if [ ! -f $unitFile ]; then
      curl -H "Authorization:$authMethod $encodedHeader" -X GET $fullUnitUrl -o $unitFile
    fi
  fi
}

jq -r '.entries[] | select(.enabled != false) | .id,.unitUrl' $fileIndexFeed |
  env_parallel -N2 doit

env_parallel will import the environment, so all shell variables are available.

Upvotes: 1

Mark Setchell
Mark Setchell

Reputation: 207345

Use GNU Parallel to control the number of parallel jobs. Either write your curl commands to a file so you can look at them and check them:

commands.txt

curl "something" "somehow" "toSomewhere"
curl "somethingelse" "someotherway" "toSomewhereElse"

Then, if you want no more than 8 jobs running at a time, run:

parallel -j 8 --eta -a commands.txt

Or you can just write the commands to GNU Parallel's stdin:

jq ... | while read ...; do
    printf "curl ..." 
done | parallel -j 8 

Upvotes: 3

Related Questions