micron cloud
micron cloud

Reputation: 93

sed to multiple replace based on the condition in a file

Experts i have a text file where i have some mathematic data and there i've hyphen - which i need to replace into 0 and MB at the end of numbers which also need be removed so, i can get only numbers.

Below is sample data in a file called file1:

Data:

$ cat file1

 3708MB 5073MB 5153MB  0MB
 -    63097MB 9939MB  53376MB
 -    817MB   681MB   271MB
 -    2655MB   692MB   2112MB

What i have tried:

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

Or just to get columnize it better way via column command ...

$ /bin/sed   's/\r//g; s/-/0/g; s/MB//g' tt4| column -t
3708  5073   5153  0
0     63097  9939  53376
0     817    681   271
0     2655   692   2112

Is there a better to make sure strictly that only replace hyphen - which do not have anything in prefix and suffix and same for removing MB only if its and the end of the numbers.

Upvotes: 6

Views: 414

Answers (5)

Just Khaithang
Just Khaithang

Reputation: 1545

You have to think how uniquely you can capture the pattern(s) so to isolate it from any other appearance of the pattern(s).

Here, - seems to be surrounded by blank spaces. So you can use that to make it unique from, say, any other text with - ( e.g. text-text ).

sed 's/ - / 0 /g'

for the pattern MB, you can ensure that you are looking for the pattern which is follows some numbers.


sed -r 's/([0-9]+)MB/\1/g' 

so together you can write:

sed -r 's/ - / 0 /g;s/([0-9]+)MB/\1/g' 

Upvotes: 5

jhnc
jhnc

Reputation: 16652

Similar to the other answers but perhaps more portable:

sed '
    s/[[:space:]]\{1,\}/  /g
    s/^/ /
    s/$/ /
    s/ - / 0 /g
    s/ \([0-9]\{1,\}\)MB / \1 /g
' tt4 | column -t

I've added whitespace guards around MB numbers too. They require at least two space characters (one at each end), so I've replaced the \r test with a more general one to ensure the condition.

Adding space at beginning and end of line means \| is not required, use of which broke the code on FreeBSD.


Or there's awk (which is probably easier to read):

awk '{
    for (i=1; i<=NF; i++) {
        if ($i=="-") $i=0
        if ($i~/^[0-9]+MB$/) sub("MB","",$i)
    }
    print
}' tt4 | column -t

Upvotes: 4

Ed Morton
Ed Morton

Reputation: 203169

Using GNU or BSD sed for -E, this may do what you want:

$ sed -E 's/(^| )-( |$)/\10\2/g; s/([0-9])MB( |$)/\1\2/g' file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

Upvotes: 5

sseLtaH
sseLtaH

Reputation: 11207

Using sed

$ sed -Ez ':a;s/([0-9]+)MB/\1/;s/(\n )-/\10/;ta' input_file
 3708 5073 5153  0
 0    63097 9939  53376
 0    817   681   271
 0    2655   692   2112

Upvotes: 2

Sparrow
Sparrow

Reputation: 148

Yes, there is a way for each ask.

sed   's/\r//g; s/\b-\b/0/g; s/\([0-9]*\)MB/\1/g' bla.txt | column -t
  1. Using \b filters only the whole word, in your case it is -, see an example below.
    $ echo "bla blablabla" | sed "s/bla/replace/g"
    replace replacereplacereplace
    $ echo "bla blablabla" | sed "s/\bbla\b/replace/g"
    replace blablabla
  1. Using \( and \) around [0-9]* properly matches MB after numbers as you asked for.

So,

$ cat bla.txt
 3708MB 5073MB 5153MB  0MB
 -    63097MB 9939MB  53376MB
 -    817MB   681MB   271MB
 -    2655MB   692MB   2112MB
$ sed   's/\r//g; s/\b-\b/0/g; s/\([0-9]*\)MB/\1/g' bla.txt | column -t
3708  5073   5153  0
-     63097  9939  53376
-     817    681   271
-     2655   692   2112
$

Upvotes: 3

Related Questions