Cambolie
Cambolie

Reputation: 1653

Text formating - sed, awk, shell

I need some assistance trying to build up a variable using a list of exclusions in a file.

So I have a exclude file I am using for rsync that looks like this:

*.log
*.out
*.csv
logs
shared
tracing
jdk*
8.6_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
**/lost+found*/
dlxwhsr*
regression
tmp
working
investigation
Investigation
dcsserver_weblogic_
dcswebrdtEAR_weblogic_

I need to build up a string to be used as a variable to feed into egrep -v, so that I can use the same exclusion list for rsync as I do when egrep -v from a find -ls.

So I have created this so far to remove all "*" and "/" - and then when it sees certain special characters it escapes them:

cat exclude-list.supt | while read line
    do
    echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'

What I need the ouput too look like is this and then export that as a variable:

SEXCLUDE_supt="\.log|\.out|\.csv|logs|shared|PR116PICL|tracing|lost\+found|jdk|8\.6\_Code|rpsupport|dbarchive|inarchive|comms|dlxwhsr|regression|tmp|working|investigation|Investigation|dcsserver\_weblogic\_|dcswebrdtEAR\_weblogic\_"

Can anyone help?

Upvotes: 1

Views: 251

Answers (3)

lev.tuby
lev.tuby

Reputation: 477

This should work but I guess there are better solutions. First store everything in a bash array:

SEXCLUDE_supt=$( sed -e 's/\*//g' -e 's/\///g' -e 's/\([.-+_]\)/\\\1/g' exclude-list.supt)

and then process it again to substitute white space:

SEXCLUDE_supt=$(echo $SEXCLUDE_supt |sed 's/\s/|/g')

Upvotes: 0

potong
potong

Reputation: 58371

This might work for you (GNU sed):

SEXCLUDE_supt=$(sed '1h;1!H;$!d;g;s/[*\/]//g;s/\([.-+_]\)/\\\1/g;s/\n/|/g' file)

Upvotes: 2

Chris Seymour
Chris Seymour

Reputation: 85775

A few issues with the following:

cat exclude-list.supt | while read line
    do
    echo $line | sed 's/\*//g' | sed 's/\///g' | 's/\([.-+_]\)/\\\1/g'

Sed reads files line by line so cat | while read line;do echo $line | sed is completely redundant also sed can do multiple substitutions by either passing them as a comma separated list or using the -e option so piping to sed three times is two too many. A problem with '[.-+_]' is the - is between . and + so it's interpreted as a range .-+ when using - inside a character class put it at the end beginning or end to lose this meaning like [._+-].

A much better way:

$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file
\.log
\.out
\.csv
logs
shared
tracing
jdk
8\.6\_Code
rpsupport
dbarchive
inarchive
comms
PR116PICL
lost\+found
dlxwhsr
regression
tmp
working
investigation
Investigation
dcsserver\_weblogic\_
dcswebrdtEAR\_weblogic\_

Now we can pipe through tr '\n' '|' to replace the newlines with pipes for the alternation ready for egrep:

$ sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|"
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...

$ EXCLUDE=$(sed -e 's/[*/]//g' -e 's/\([._+-]\)/\\\1/g' file | tr "\n" "|")

$ echo $EXCLUDE
\.log|\.out|\.csv|logs|shared|tracing|jdk|8\.6\_Code|rpsupport|dbarchive|...

Note: If your file ends with a newline character you will want to remove the final trailing |, try sed 's/\(.*\)|/\1/'.

Upvotes: 4

Related Questions