Reputation: 47
I have a list of words I need to check in more one hundred text files.
My list of word's file named : word2search.txt.
This text file contains N word :
Word1
Word2
Word3
Word4
Word5
Word6
Wordn
So far I've done this bash file :
#!/bin/bash
listOfWord2Find=/home/mobaxterm/MyDocuments/word2search.txt
while IFS= read -r listOfWord2Find
do
echo "$listOfWord2Find"
grep -l -R "$listOfWord2Find" /home/mobaxterm/MyDocuments/txt/*.txt
echo "================================================================="
done <"$listOfWord2Find"
The result does not satisfy me, I can hardly exploit the result
Word1
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
/home/mobaxterm/MyDocuments/txt/file2.txt
/home/mobaxterm/MyDocuments/txt/file3.txt
=================================================================
Word2
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file4.txt
/home/mobaxterm/MyDocuments/txt/file5.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word4
/home/mobaxterm/MyDocuments/txt/new 6.txt
/home/mobaxterm/MyDocuments/txt/file1.txt
=================================================================
Word5
/home/mobaxterm/MyDocuments/txt/new 6.txt
=================================================================
This is what i want to see :
/home/mobaxterm/MyDocuments/txt/file1.txt : Word1, Word2, Word3, Word4
/home/mobaxterm/MyDocuments/txt/file2.txt : Word1
/home/mobaxterm/MyDocuments/txt/file3.txt : Word1
/home/mobaxterm/MyDocuments/txt/file4.txt : Word3
/home/mobaxterm/MyDocuments/txt/file5.txt : Word3
/home/mobaxterm/MyDocuments/txt/new 6.txt : Word1, Word2, Word3, Word4, Word5, Word6
I do not understand why my script doesnt show me the Word6(there are files which contains this word6). It stops at word5. To avoid this issue, I've added a new line blablabla (I'm sure to not find this occurence).
If you can help me on this subject :) Thank you.
Upvotes: 0
Views: 659
Reputation: 158090
Just grep:
grep -f list.txt input.*.txt
-f FILENAME
allows to use a file with patterns for grep to search.
If you want to display the filename along with the match, pass -H
in addition to that:
grep -Hf list.txt input.*.txt
Upvotes: 0
Reputation: 4890
Another much more elegant approach to search all words on each file. One file at a time.
Use grep
command multi pattern option -f, --file=FILE
, and print matched lines with -o, --only-matching
Then to pipe massage the resulting words into csv
list.
Like this:
#!/bin/bash
for currFile in $*; do
matched_words_list=$(grep --only-matching --file=$WORDS_LIST $currFile |sort|uniq|awk -vORS=', ' 1|sed "s/, $//")
printf "%s : %s\n" "$currFile" "$matched_words_list"
done
Passing words list file in environment variable: WORDS_LIST
Passing inspected files list as arguments list input.*.txt
export WORDS_LIST=./words.txt; ./script.sh input.*.txt
input.1.txt : word1, word2
input.2.txt : word4
input.3.txt :
word2
word1
word5
word4
word1
word2
word3
word3
word1
word3
grep
commandgrep --file=words.txt -o input.1.txt |sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
List all matched words from words.txt in inspected file input.1.txt
grep --file=words.txt -o input.1.txt
word1
word2
word1
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
grep --file=words.txt -o input.1.txt|sort
word1
word1
word2
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
grep --file=words.txt -o input.1.txt|sort|uniq
word1
word2
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv
list from the unique words
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1
word1,word2,
List all matched words from words.txt in inspected file input.1.txt
Than sort the output words list
Than remove duplicate words
Than create a csv
list from the unique words
Than remove trailing ,
from csv
list
grep --file=words.txt -o input.1.txt|sort|uniq|awk -vORS=, 1|sed s/,$//
word1,word2
Upvotes: 2
Reputation: 4890
The suggest strategy is to scan each line once with all words.
Suggest to write gawk
script, which is standard Linux awk
FNR == NR { # Only in first file having match words list
matchWordsArr[++wordsCount] = $0; # read match words into ordered array
matchedWordInFile[wordsCount] = 0; # reset matchedWordInFile array
}
FNR != NR { # Read line in inspected file
for (i in matchWordsArr) { # scan line for all match words
if ($0 ~ matchWordsArr[i]) matchedWordInFile[i]++; # if word is mached increment respective matchedWordInFile[i]
}
}
ENDFILE{ # on each file read completion
if (FNR != NR) { # if not first file
outputLine = sprintf("%s: ", FILENAME); # assign outputLine header to current fileName
for (i in matchWordsArr) { # iterate over matched words
if (matchedWordInFile[i] == 0) continue; # skip unmatched words
outputLine = sprintf("%s%s%s", outputLine, seprator, matchWordsArr[i]); # append matched word to outputLine
matchedWordInFile[i] = 0; # reset matched words array
seprator = ","; # set words list seperator ","
}
print outputLine;
}
outputLine = seprator = ""; # reset words list seperator "" and outputLine
}
word1
word2
word3
word3
word4
word5
word3
word7
word8
word2
word1
word5
word4
$ awk -f script.awk words.txt input.*.txt
input.1.txt: word2,word1
input.2.txt: word5,word4
input.3.txt:
Upvotes: 0