Reputation: 3
I have a list of URLs, and I would like to only keep the ones that return a certain content header. The way I am trying this is:
$ cat url_list | tee [???] | xargs curl -sIL | grep -qiE 'Content-Type: text' && echo [???]
but I don't know what to do for the [???] in tee and echo. I think the solution will use process substitution or file descriptors, but I haven't been able to make it work.
Upvotes: 0
Views: 103
Reputation: 295650
xargs
is the wrong tool for this job -- and when you don't use it, you don't need tee
either.
#!/usr/bin/env bash
# Create an array called text_urls
text_urls=( )
while IFS= read -r line; do
if curl -sIL "$line" | grep -qiE 'Content-Type: text'; then
text_urls+=( "$line" )
fi
done <url_list
# Demonstrate the data stored in that array variable
echo "The following ${#text_urls[@]} URLs have Content-Type: text --"
printf ' %s\n' "${text_urls[@]}"
See BashFAQ #1 describing the while read
loop, and BashFAQ #24 describing why pipelines make storing data as variables harder.
Upvotes: 1