user3232418
user3232418

Reputation: 39

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.

find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do 


     nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)" 

     nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
     echo "$nb_ligne_fichier_R"

done

output:

  43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R  
  90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
  45 .//system d exploi/r-repos/gbm/R/checks.R
 178 total: File name too long

can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"

Many Thanks

Upvotes: 0

Views: 231

Answers (4)

konsolebox
konsolebox

Reputation: 75588

Consider this solution:

# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob

# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob

while IFS= read -ru 4 -d '' dir; do 
    files=("$dir"/*.R)

    echo "${#files[@]}"

    for file in "${files[@]}"; do
        wc -l "$file"
    done

    # Use process substitution to prevent going to a subshell. This may not be
    # necessary for now but it could be useful to future modifications.
    # Let's also use a custom fd to keep troubles isolated.
    # It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)

Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.

shopt -s nullglob
shopt -s nocaseglob

readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)

for dir in "${dirs[@]}"; do
    files=("$dir"/*.R)

    echo "${#files[@]}"

    for file in "${files[@]}"; do
        wc -l "$file"
    done
done

Upvotes: 0

tripleee
tripleee

Reputation: 189830

Maybe I'm missing something, but wouldn't this do what you want?

wc -l R/*.[Rr]

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754920

For the command:

find "$d_path" -type d -maxdepth 1 -name R -print0

there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:

  1. The number of files matching *.R
  2. For each such file, the number of lines in it.

Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.

counter.sh

shopt -s nullglob;
for dir in "$@"
do
    count=0
    for file in "$dir"/*.R; do ((count++)); done
    echo "$count"
    wc -l "$dir"/*.R </dev/null
done

The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).

On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:

    for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done

or use find (rather carefully):

    find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +

Using counter.sh

find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +

This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.

Discussion

The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.

Upvotes: 0

triggerNZ
triggerNZ

Reputation: 4771

Solution:

find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do


     nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
     echo "$nb_fichier_R" #here is fine

    find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
        wc -l $fille #here is the problem nothing shown
    done
done

Explanation:

adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.

Upvotes: 0

Related Questions