Pierre-Antoine Guillaume
Pierre-Antoine Guillaume

Reputation: 1126

How can I sort urls in bash shell such as hostnames are evaluated before subdomain names?

I am given a file (usually the content of a grep) that contains one URL per line.

I am looking for a way to sort the urls such as :

  1. Sort by Hostname
  2. Sort by Subdomain name
  3. Sort by Path

Here is an example of a file containing the what there is to sort :

www.example.com

www.my-website.com

www.example.org

my-website.com

www.my-website.org

And how it would be sorted :

www.example.com

www.example.org

my-website.com

www.my-website.com

www.my-website.org

For now, I use a solution that's quite suboptimal because I sort by top-level-domain first with

... | rev | sort -u | rev
# notice the -u flag in the sort, it is optional but appreciated

It should be said that this piece of software is to be used in (forseeably) two cases :

In both case, most of the URLs are related to each other.

How can I "smart"-sort these URLs in bash ?

Upvotes: 0

Views: 791

Answers (1)

Ipor Sircer
Ipor Sircer

Reputation: 3141

Put a dot before www-less hostnames with sed:

$ cat dom.txt |sed -e 's/^\([^.]*\.[^.]*\)$/.\1/'|sort -t . -k2|sed -e 's/^\.//'
www.example.com
www.example.org
my-website.com
www.my-website.com
www.my-website.org

Upvotes: 2

Related Questions