Sameer Atharkar
Sameer Atharkar

Reputation: 442

Sed to remove more than 2 words in a sentence

I am trying to get a sed command which will help me with the output which will display just the 2 words & not more than that.

echo  "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas"  | sed 's/,/<br>/g; s/:/  #  /g; s/\b\(.\)/\u\1/g'

Expected output :

Test1  #  Pass
Test2  #  Fail
Test3  #  Pass
Test4  #  Pass
Test5  #  Pass
Test6  #  Pass 

I don't want the asfas to be present in the last Test6 line.

Also, I just want that the result should be either Pass or Fail, nothing else should come like PAss or PaSS Whatever is there in echo command either PaSS or PAss or FaIl or FAil, it should get replaced with either Pass or Fail only. Any word which is mentioned after the Pass or Fail should get removed and needs not to be shown.

Can someone tell me the more cleaner way to achieve the requirement from what I wrote ?

Thanks :)

Upvotes: 0

Views: 745

Answers (5)

Ed Morton
Ed Morton

Reputation: 204055

Just use awk. Using any awk in any shell on every Unix box:

$ echo  "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" |
awk -v RS=',' -F':' -v OFS=' # ' '
    {
        sub(/ .*/,"")
        for (i=1; i<=NF; i++) {
            $i = toupper(substr($i,1,1)) tolower(substr($i,2))
        }
        print
    }
'
Test1 # Pass
Test2 # Fail
Test3 # Pass
Test4 # Pass
Test5 # Pass
Test6 # Pass

Upvotes: 0

Walter A
Walter A

Reputation: 20022

In your solution you should use \n, not <BR> and invoke sed twice.
And a small change to remome the remainder of the line.

echo "fOO:paSS,tesT2:fail,TESt:pasS,fdfdhfd:pass,test5:anyresult test,test6:pass asfas"|
  sed -r 's/,/\n/g' | sed -r 's/(.*):(.)(\w*).*/\1 # \u\2\L\3<br>/g'

EDIT:

  1. I first thought only a four-letter word would be parsed.I changed the solution, so it will keep the first word.
  2. OP wants to use this for HTML. I would prefer <pre>...</pre> above parsing text, but I added a <br> at the end of each line.

Upvotes: 0

potong
potong

Reputation: 58483

This might work for you (GNU sed):

sed 's/.*/\L&/;s/\w\+/\u&/g;s/:/ # /g;y/,/\n/' file | 
sed 's/\w\+/&\n/2;P;d'

Two invocations of sed.

First invocation:

  • Lowercase everything.
  • Uppercase the first character of each word.
  • Format : to # .
  • Split line into lines on commas.

Second invocation:

  • Split line by a newline after the second word of the line.
  • Print first line of two lines only and delete the other.

N.B. The second invocation may be improved if blank and single word lines are not wanted:

sed -E 's/\w+/&\n/2;Ta;P;:a;d'

Upvotes: 1

danadam
danadam

Reputation: 3450

With more complex input (notice that the unwanted text in test3 contains comma):

test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas

I would do it with 3 invocations of sed and 1 cut. First invocation splits it into lines, second one makes necessary changes and the last one joins lines back with <br>:

echo  "test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas" |
    sed -e 's/,/\n/g' |
    sed -e '/^test[0-9]/ ! d' \
        -e 's/pass/Pass/i' \
        -e 's/fail/Fail/i' \
        -e 's/:/ # /' |
    cut -d' ' -f 1-3 |
    sed ':a; N; $!ba; s/\n/<br>/g'

Or if it is required to use only sed:

echo  "test1:PAss,test2:FAil,test3:pass foobar, barfoo,test4:pass,test42:pass,test6:pass asfas" |
    sed -e 's/,/\n/g' |
    sed -e '/^test[0-9]/ ! d' \
        -e 's/pass/Pass/i' \
        -e 's/fail/Fail/i' \
        -e 's/:/ # /' \
        -e 's/\([[:alnum:]]* # [[:alnum:]]*\).*/\1/' |
    sed ':a; N; $!ba; s/\n/<br>/g'

Output in both cases:

test1 # Pass<br>test2 # Fail<br>test3 # Pass<br>test4 # Pass<br>test42 # Pass<br>test6 # Pass

and without code formatting:

test1 # Pass
test2 # Fail
test3 # Pass
test4 # Pass
test42 # Pass
test6 # Pass

  • /^test[0-9]/ ! d removes lines that don't start with test[0-9].
  • s/pass/Pass/i is case insensitive so it matches any "pass" and replaces it with "Pass". Accordingly for "fail".
  • s/\([[:alnum:]]* # [[:alnum:]]*\).*/\1/ captures 2 words separated by # and replaces the whole line with this captured content.
  • :a; N; $!ba; s/\n/<br>/g is taken from https://www.baeldung.com/linux/join-multiple-lines#sed. It defines label a, appends lines to pattern space and lastly replaces \n with <br>.

Upvotes: 0

KamilCuk
KamilCuk

Reputation: 141493

The following is shell command:

$ echo "test1:pass,test2:fail,test3:pass,test4:pass,test5:pass,test6:pass asfas" | sed '
   # replace test[0-9]:(pass or fail) by test[0-9] # (pass or fail).
   # match anything up until an optional comma after, to remove any text after
   # matched globally, so it repeats for each pattern
   s/\(test[0-9]\):\(pass\|fail\)[^,]*,\?/\1 # \2\n/g;
   # apply uppercase to first letters
   s/pass/Pass/gi; s/fail/Fail/gi;
   # The first pattern will add a trailing newline to pattern space
   # remove it
   s/\n*$//
'

would output:

test1 # Pass
test2 # Fail
test3 # Pass
test4 # Pass
test5 # Pass
test6 # Pass

You can learn regex with fun with regex crossowrds.

Upvotes: 0

Related Questions