iemcd
iemcd

Reputation: 3

Filter list of URLs based on content headers

I have a list of URLs, and I would like to only keep the ones that return a certain content header. The way I am trying this is:

$ cat url_list | tee [???] | xargs curl -sIL | grep -qiE 'Content-Type: text' && echo [???]

but I don't know what to do for the [???] in tee and echo. I think the solution will use process substitution or file descriptors, but I haven't been able to make it work.

Upvotes: 0

Views: 103

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295650

xargs is the wrong tool for this job -- and when you don't use it, you don't need tee either.

#!/usr/bin/env bash

# Create an array called text_urls
text_urls=( )
while IFS= read -r line; do
  if curl -sIL "$line" | grep -qiE 'Content-Type: text'; then
    text_urls+=( "$line" )
  fi
done <url_list

# Demonstrate the data stored in that array variable
echo "The following ${#text_urls[@]} URLs have Content-Type: text --"
printf '  %s\n' "${text_urls[@]}"

See BashFAQ #1 describing the while read loop, and BashFAQ #24 describing why pipelines make storing data as variables harder.

Upvotes: 1

Related Questions