Reputation: 156
Is there any Windows app that will search for a string of text within fields in a Word (DOCX) document? Apps like Agent Ransack and its big brother FileLocator Pro can find strings in the Word docs but seem incapable of searching within fields.
For example, I would like to be able to find all occurrences of the string "getProposalTranslations" within a collection of Word documents that have fields with syntax like this:
{ AUTOTEXTLIST \t "<wr:out select='$.shared_quote_info' datasource='getProposalTranslations'/>" }
Note that string doesn't appear within the text of the document itself but rather only within a field. Essentially the DOCX file is just a zip file, I believe, so if there's a tool that can grep within archives, that might work. Note also that I need to be able to search across hundreds or perhaps thousands of files in many directories, so unzipping the files one by one isn't feasible. I haven't found anything on my own and thought I'd ask here. Thanks in advance.
Upvotes: 3
Views: 5584
Reputation: 638
This script should accomplish what you are trying to do. Let me know if that isn't the case. I don't usually write entire scripts because it can hurt the learning process, so I have commented each command so that you might learn from it.
#!/bin/sh
# Create ~/tmp/WORDXML folder if it doesn't exist already
mkdir -p ~/tmp/WORDXML
# Change directory to ~/tmp/WORDXML
cd ~/tmp/WORDXML
# Iterate through each file passed to this script
for FILE in $@; do
{
# unzip it into ~/tmp/WORDXML
# 2>&1 > /dev/null discards all output to the terminal
unzip $FILE 2>&1 > /dev/null
# find all of the xml files
find -type f -name '*.xml' | \
# open them in xmllint to make them pretty. Discard errors.
xargs xmllint --recover --format 2> /dev/null | \
# search for and report if found
grep 'getProposalTranslations' && echo " [^ found in file '$FILE']"
# remove the temporary contents
rm -rf ~/tmp/WORDXML/*
}; done
# remove the temporary folder
rm -rf ~/tmp/WORDXML
Save the script wherever you like. Name it whatever you like. I'll name it docxfind
. Make it executable by running chmod +x docxfind
. Then you can run the script like this (assuming your terminal is running in the same directory): ./docxfind filenames...
Upvotes: 4