user1189851
user1189851

Reputation: 5041

cat reading from an array of files

I am collecting some tsv files on a daily basis in the directory structure that looks like /tmp/data/$yearmonth/$day/$hour. So there are 24 directories inside of /tmp/data/$yearmonth/$day

I have shell script like this:

yearmonth=`date -d "-2 days" +%Y%m`
day=`date -d "-2 days" +%d`

files=()
cd /tmp/data/$yearmonth/$day
for i in `ls -a */*.tsv`
do
  files+=($i)
done

The array files has all the tsv files stored in it. I want to "cat" all these tsvfiles to one single tsvfiles and want to perform sort|uniq -c on it. How do I do that? As the tsv files become huge cat can get very slow. What could be the other alternative. Thanks

Upvotes: 1

Views: 259

Answers (1)

user1666959
user1666959

Reputation: 1855

Some issues with the code you are showing:

  1. If you have enough files or the names are long enough in your subdirs ls -a is going to fail with too many files in the argument list. The standard remedy is to use find

    find /tmp/data/year/mon/day -type f -iname '*.tsv' -print0

  2. Once you have find you can pipe the file list it generates directly into sort

    | xargs -0 sort --unique

No cat involved, but of course, the files still need to be found and read.

Upvotes: 1

Related Questions