Alan De Moin
Alan De Moin

Reputation: 19

Delete words in a line using grep or sed

I want to delete three words with a special character on a line such as

Input:

\cf4 \cb6 1749,1789 \cb3 \

Output:

1749,1789

I have tried a couple sed and grep statements but so far none have worked, mainly due to the character \.

My unsuccessful attempt:

sed -i 's/ [.\c ] //g' inputfile.ext >output file.ext

Upvotes: 0

Views: 1663

Answers (4)

Joshua
Joshua

Reputation: 43278

My guess is you are having trouble because you have backslashes in input and can't figure out how to get backslashes into your regex. Since backslashes are escape characters to shell and regex you end up having to type four backslashes to get one into your regex.

Ben Van Camp already posted an answer that uses single quotes to make the escaping a little easier; however I shall now post an answer that simply avoids the problem altogether.

grep -o '[0-9]*,[0-9]*' | tr , .

Locks on to the comma and selects the digits on either side and outputs the number. Alternately if comma is not guaranteed we can do it this way:

egrep -o ' [0-9,]*|^[0-9,]*' | tr , . | tr -d ' '

Both of these assume there's only one usable number per line.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203493

$ awk '{sub(/,/,".",$3); print $3}' file
1749.1789

$ sed 's/\([^ ]* \)\{2\}\([^ ]*\).*/\2/; s/,/./' file
1749.1789

Upvotes: -2

Ben Van Camp
Ben Van Camp

Reputation: 136

The backslash is a special meta-character that confuses bash.

We treat it like any other meta-character, by escaping it, with--you guessed it--a backslash!

But first, we need to grep this pattern out of our file

grep '\\... \\... [0-9]+,[0-9]+ \\... \\' our_file # Close enough!

Now, just sed out those pesky backslashes

| sed -e 's/\\//g' # Don't forget the g, otherwise it'll only strip out 1 backlash

Now, finally, sed out the clusters of 2 alpha followed by a number and a space!

 | sed -e 's/[a-z][a-z][0-9] //g'

And, finally....

grep '\\... \\... [0-9]+,[0-9]+ \\... \\' our_file | sed -e 's/\\//g' | sed -e 's/[a-z][a-z][0-9] //g'

Output:

1749,1789

Upvotes: 1

vintnes
vintnes

Reputation: 2030

Awk accepts a regex Field Separator (in this case, comma or space):

$ awk -F'[ ,]' '$0 = $3 "." $4' <<< '\cf4 \cb6 1749,1789 \cb3 \'
1749.1789
  • -F'[ ,]' - Use a single character from the set space/comma as Field Separator
  • $0 = $3 "." $4 - If we can set the entire line $0 to Field 3 $4 followed by a literal period "." followed by Field 4 $4, do the default behavior (print entire line)

Replace <<< 'input' with file if every line of that file has the same delimeters (spaces/comma) and number of fields. If your input file is more complex than the sample you shared, please edit your question to show actual input.

Upvotes: 5

Related Questions