Kriss
Kriss

Reputation: 435

passing variable containing special chars to sed in bash

I need to remove subdomains from file:

.domain.com
.sub.domain.com -- this must be removed
.domain.com.uk
.sub2.domain.com.uk -- this must be removed

so i have used sed :

sed '/\.domain.com$/d' file
sed '/\.domain.com.uk$/d' file

and this part was simple, but when i try to do it in the loop problems appears:

while read line
do
sed '/\$line$/d' filename > filename   
done < filename

I suppose it is "." and $ problem , have tried escaping it in many ways but i am out of ideas now.

Upvotes: 0

Views: 119

Answers (3)

Jakub Kotowski
Jakub Kotowski

Reputation: 7571

A solution inspired by NeronLeVelu's idea:

#!/bin/bash

#set -x

domains=($(rev domains | sort))

for i in `seq 0 ${#domains[@]}` ;do
    domain=${domains[$i]}
    [ -z "$domain" ] && continue
    for j in `seq $i ${#domains[@]}` ;do
        [[ ${domains[$j]} =~ $domain.+  ]] && domains[$j]=
    done
done


for i in `seq 0 ${#domains[@]}` ;do
    [ -n "${domains[$i]}" ] && echo ${domains[$i]} | rev >> result.txt
done

For cat domains:

.domain.com
.sub.domain.com
.domain.co.uk
.sub2.domain.co.uk
sub.domain.co.uk
abc.yahoo.com
post.yahoo.com
yahoo.com

You get cat result.txt:

.domain.co.uk
.domain.com
yahoo.com

Upvotes: 2

Jakub Kotowski
Jakub Kotowski

Reputation: 7571

Your loop is a bit confusing because you're trying to use sed to delete patterns from a file but you take the patterns from the same file.

If you really want to remove subdomains from filename then I suppose you need more something like the following:

#!/bin/bash

set -x 

cp domains domains.tmp

while read domain
do
  sed -r -e "/[[:alnum:]]+${domain//./\\.}$/d" domains.tmp > domains.tmp2
  cp domains.tmp2 domains.tmp
done < dom.txt 

Where cat domains is:

.domain.com
.sub.domain.com
.domain.co.uk
.sub2.domain.co.uk
sub.domain.co.uk
abc.yahoo.com
post.yahoo.com

and cat dom.txt is:

.domain.com
.domain.co.uk
.yahoo.com

Running the script on these inputs results in:

$ cat domains.tmp
.domain.com
.domain.co.uk

Each iteration will remove subdomains of domain currently read from dom.txt, store it in a temporary file the contents of which is used in the next iteration for additional filtering.

It's good to try your scripts with set -x, you'll see some of the substitutions, etc.

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed -n 's/.*/²&³/;H
$ {x;s/$/\
/
: again
  s|\(\n\)²\([^³]*\)³\(.*\)\1²[^³]*\2³|\1\2\3|
  t again
  s/[²³]//g;s/.\(.*\)./\1/
  p
  }' YourFile

Load the file in working buffer then remove (iterative) any line that end with an earlier one, finally priont the result. Use of temporary edge delimiter easier to manage than \n in pattern

--posix -e for GNU sed (tested from AIX)

Upvotes: 2

Related Questions