B.Hunt
B.Hunt

Reputation: 11

How to remove punctuation from end and start of a word with sed in linux?

I am trying to figure out how many times each word occurs in a file using linux.

I have placed each word from my file onto a new line by using the code below.

sed -i 's/ /\n/g' books2 

I am now trying to replace the start and end of the word with a blank as some words contain punctuation. I am currently doing this by using the following code, but it does not seem to be working. Once I have this I will be able to run a command that will count all the words and return a list of counts on all the words. Can someone correct me on how to remove the punctuation?

sed -i 's/\([^[:alpha:]]\)$//' books2 #this is my attempt to remove the punctuality at the end of the word


sed -i 's/\([^[:alpha:]]\)^.*//' books2 #this is my attempt to remove the punctuality from the front. 

When I run either of the lines of code above my file becomes empty. Why is this?

Upvotes: 1

Views: 1721

Answers (1)

Secespitus
Secespitus

Reputation: 709

To remove the punctuation from the beginning of the line you can use the following command:

 sed 's/^[^[:alpha:]]\+//' books2 

This will remove every non-alphabetic character at the beginning of a line. It also matches multiple instances, whereas your example would only match one instance.

To remove the punctuation from the end of the line you can use the following command:

sed 's/[^[:alpha:]]\+$//' books2

If there is no puncutation in the words you can also run:

sed 's/[^[:alpha:]]\+//' books2

to remove all non-alphabetical characters in one command.

Upvotes: 5

Related Questions