Brandon Bertelsen
Brandon Bertelsen

Reputation: 44638

MS Word Doc: Automating find/replace using Shell Scripts

I have a number of word documents that I'd like to remove some elements from. What I would like to do is as follows:

  1. Copy and paste the entire contents of the word file (may not be necessary) and move it into a text file OR Convert .doc to .txt
  2. Using regex: replace \[.*\] with "" AND replace \(.*\) with ""
  3. Save the result to a text file with the same name as the original word document.

Thoughts and direction appreciated. As it stands now, I don't know how to do any of these things programatically. I'm doing this manually as it stands.

If it matters, I'm using Ubuntu 11.04

Upvotes: 1

Views: 1436

Answers (1)

jman
jman

Reputation: 11586

Since you're open to using plain text, some improvements to your algo:

  1. Use antiword to automate conversion from doc to tx
  2. Use sed to do in-place regex modification: sed -i -e's/bad/good/' file.txt

Update (in response to comment):

The regexes are fine, but I didn't understand the objective completely:

  • if you want to replace occurrences of [foo] & (foo) with "" use:

    sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt

  • if you want to replace occurrences [foo] & (foo) with "foo" each use:

    sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt

Upvotes: 2

Related Questions