Reputation: 5041
I am collecting some tsv files on a daily basis in the directory structure that looks like /tmp/data/$yearmonth/$day/$hour. So there are 24 directories inside of /tmp/data/$yearmonth/$day
I have shell script like this:
yearmonth=`date -d "-2 days" +%Y%m`
day=`date -d "-2 days" +%d`
files=()
cd /tmp/data/$yearmonth/$day
for i in `ls -a */*.tsv`
do
files+=($i)
done
The array files has all the tsv files stored in it. I want to "cat" all these tsvfiles to one single tsvfiles and want to perform sort|uniq -c on it. How do I do that? As the tsv files become huge cat can get very slow. What could be the other alternative. Thanks
Upvotes: 1
Views: 259
Reputation: 1855
Some issues with the code you are showing:
If you have enough files or the names are long enough in your subdirs ls -a
is going to fail with too many files in the argument list. The standard remedy is to use find
find /tmp/data/year/mon/day -type f -iname '*.tsv' -print0
Once you have find you can pipe the file list it generates directly into sort
| xargs -0 sort --unique
No cat
involved, but of course, the files still need to be found and read.
Upvotes: 1