vnr
vnr

Reputation: 11

sed – search for special characters like underscore and replace with dashes

I'm looking for solution to replace "_" symbols with "-".

  383 =>.
  array (                   
    'url' => 'order-samsung_s5-online-en.html',
    'module' => 'product',  
    'action' => 'get',      
    'oid' => '14',          
    'lang' => 'en'

and after replacement it should look like this:

  383 =>.
  array (                   
    'url' => 'order-samsung-s5-online-en.html',
    'module' => 'product',  
    'action' => 'get',      
    'oid' => '14',          
    'lang' => 'en'

There may be other underscore("_") symbols in the text file, but replacement should only happen within the URLs, between 'url' => 'order- and -online.

Upvotes: 0

Views: 525

Answers (4)

SLePort
SLePort

Reputation: 15471

With sed, you can use the t(test) command to process all _ found between your strings.

When a substitution succeeds, ta will loop to the beginning of the script(:a) to search and replace remaining _:

sed ":a;s/\('url' => 'order-[^_]*\)_\(.*-online\)/\1-\2/;ta;" file

Upvotes: 2

ctac_
ctac_

Reputation: 2491

If I understand well the requirement, you can try this sed

sed -E '/order/!b;s//&\n/;s/online/\n&/;h;s/\n.*\n/\n/;x;s/[^\n]*\n([^\n]*).*/\1/;s/_/-/g;x;G;s/([^\n]*)\n([^\n]*)\n(.*)/\1\3\2/' infile

first separate the text between order and online on the line that contain order.

operate the substitution from _ to - on this line.

rearrange the complete line.

cat infile

  383 =>.
  array (                   
    'url' => 'order-samsung_s5-online-en.html',
    'module' => 'product',  
    'action' => 'get',      
    'oid' => '14',          
    'lang' => 'en'

output

383 =>.
array (                   
  'url' => 'order-samsung-s5-online-en.html',
  'module' => 'product',  
  'action' => 'get',      
  'oid' => '14',          
  'lang' => 'en'

Upvotes: 0

thanasisp
thanasisp

Reputation: 5975

Here is a solution with awk that replaces character only between the two strings. I have assumed the first one is always at the beginning of row.

awk '/^(\047)url(\047) => (\047)order-/ {
        i=index($0,"-online")
        interesting=substr($0,0,i-1)
        gsub(/_/,"-",interesting)
        $0=interesting substr($0,i)
} 1' file
  • (\047) is octal value for single quote used for escaping.
  • index() returns the index of first occurence of the string in row (or zero if not found)
  • substr() until this index gives the interesting part
  • gsub() replaces character in string and assigns the result to it.
  • 1 prints the row

test file

some_text
'url' => 'order-word_word1-online-es.html',
'url' => 'order-word_word1_word2-online-es.html',
'url' => 'order-word_word1_word2_word3-online-es.html', some_text
some_text
'url' => 'order-word_offline.html'

output

some_text
'url' => 'order-word-word1-online-es.html',
'url' => 'order-word-word1-word2-online-es.html',
'url' => 'order-word-word1-word2-word3-online-es.html', some_text
some_text
'url' => 'order-word_offline.html'

Upvotes: 1

red-E
red-E

Reputation: 1424

Closest I could do was this:

grep -i "^'url' => 'order-" t.txt | sed -e 's/\([[:alnum:]]*\)[_]/\1-/g'

this needs grep to find lines starting with 'url' => 'order- and this WILL substitute every underscore found in this line. This will not stop after -online

Upvotes: 0

Related Questions