profiler
profiler

Reputation: 627

find + sed, filename output

I have directory: D:/Temp, where there are a lot of subfolders with text files. Each folder has "file.txt". In some file.txt files is a word - "pattern". I would like check how many pattern words there are, and also get the filepath to that file.txt:

find D:/Temp -type f -name "file.txt" -exec basename {} cat {}  \; | sed -n '/pattern/p' | wc -l

Output should be:

4
D:/Temp/abc1/file.txt
D:/Temp/abc2/file.txt
D:/Temp/abc3/file.txt
D:/Temp/abc4/file.txt

Or similar.

Upvotes: 2

Views: 4452

Answers (8)

Ed Morton
Ed Morton

Reputation: 204228

If your file names don't contain spaces then all you need is:

awk '/pattern/{print FILENAME; cnt++; nextfile} END{print cnt+0}' $(find D:/Temp -type f -name "file.txt")

The above used GNU awk for nextfile.

Upvotes: 1

ghoti
ghoti

Reputation: 46876

The way I'm reading your question, I'm going to answer as if:

  • some but not all file.txt files contain pattern,
  • you want a list of the paths leading to file.txt with pattern, and
  • you want a count of pattern in each of those files.

There are a few options. (Always multiple ways to do anything.)

If your bash is version 4 or higher, you can use globstar to recurse through directories:

shopt -s globstar

for file in **/file.txt; do
  if count=$(grep -c 'pattern' "$file"); then
    printf "%d %s\n" "$count" "${file%/*}"
  fi
done

This works because the if evaluation considers a failed grep (i.e. zero occurrences) to be FALSE, and thus does not print results.

Note that this may be high impact because it launches a separate grep on each file that is found. A lighter weight alternative might be to run a single grep on the fileglob, and parse the results:

shopt -s globstar

grep -c 'pattern' **/file.txt | grep -v ':0$'

This also depends on bash 4, and of course if you have millions of files you may overwhelm bash's command line maximum length. The output of this will be obvious, but you'll need to parse it with care if your filenames contain colons. I.e. cut -d: -f2 may not cut it.

One more option that leverages grep instead of bash might be:

grep -r --include 'file.txt' -c 'pattern' ./ | grep -v ':0$'

This uses GNU grep's --include option which modified the behaviour of -r (recursive). It should work in Linux, FreeBSD, NetBSD, OSX, but not with the default grep on OpenBSD or most SVR4 (Solaris, HP/UX, etc).

Note that I have tested none of these. No liability assumed. May contain nuts.

Upvotes: 0

Jay jargot
Jay jargot

Reputation: 2868

Give a try to this safe and standard version:

find D:/Temp -type f -name file.txt -printf "%p\0" | xargs -0 bash -c 'printf "%s" "${@}"; grep -c "pattern" "${@}"' | grep ":[1-9][0-9]*$"

For each file.txt file found in D:/Temp directory and sub-directories, the xargs command prints the filename and the number of lines which contain pattern (grep -c).

A final grep ":[1-9][0-9]*$" selects only filenames with a count greater than 0.

Upvotes: 0

hek2mgl
hek2mgl

Reputation: 158130

I would use

find D:/Temp -type f -name "file.txt" -exec dirname {} \; > tmpfile
wc -l tmpfile
cat tmpfile
rm tmpfile

Upvotes: 0

Robert Seaman
Robert Seaman

Reputation: 2592

Previously I've used:

grep -Hc "pattern" $(find D:/temp -type f -name "file.txt")

This will only work if file.txt is found. Otherwise you could use the following which will account for when both files are found or not found:

searchFiles=$(find D:/temp -type f -name "file.txt"); [[ ! -z "$searchFiles" ]] && grep -Hc "pattern" $searchFiles

The output for this would look more like:

D:/Temp/abc1/file.txt 2
D:/Temp/abc2/file.txt 1
D:/Temp/abc3/file.txt 1
D:/Temp/abc4/file.txt 1

Upvotes: 0

Dominique
Dominique

Reputation: 17543

I'd propose you to use two commands : one for find all the files:

find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \;

Another for counting them:

find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \; | wc -l

Upvotes: 0

Aaron
Aaron

Reputation: 24812

You could use GNU grep :

grep -lr --include file.txt "pattern" "D:/Temp/"

This will return the file paths.

grep -cr --include file.txt "pattern" "D:/Temp/"

This will return the count (counting the pattern occurences rather than the number of files)

Explanation of the flags :

  • -r makes grep recursively browse its target, that can then be a directory
  • --include <glob> makes grep restrict its recursive browsing to files matching the <glob>.
  • -l makes grep only return the files path. Additionnaly, it will stop parsing a file as soon as it has encountered the pattern.
  • -c makes grep only return the number of matches

Upvotes: 2

Marcin
Marcin

Reputation: 3524

This should do it:

find . -name "file.txt" -type f -printf '%p\n' | awk '{print} END { print NR }'

Upvotes: -1

Related Questions