Reputation: 471
i have many text files in a directory like 1.txt 2.txt 3.txt 4.txt .......2000.txt
and i want to paste them to make a large file.
In this regard i did something like
paste *.txt > largefile.txt
but the above command reads the .txt file randomly, so i need to read the files sequentially and paste as 1.txt 2.txt 3.txt....2000.txt
please suggest a better solution for pasting many files.
Thanks and looking forward to hearing from you.
Upvotes: 0
Views: 434
Reputation: 140960
Sort the file names numerically yourself then.
printf "%s\n" *.txt | sort -n | xargs -d '\n' paste
When dealing with many files, you may hit ulimit -n
. On my system ulimit -n
is 1024
, but this is a soft limit and can be raised with just like ulimit -n 99999
.
Without raising the soft limit, go with a temporary file that would accumulate results each "round" of ulimit -n
count of files, like:
touch accumulator.txt
... | xargs -d '\n' -n $(($(ulimit -n) - 1)) sh -c '
paste accumulator.txt "$@" > accumulator.txt.sav;
mv accumulator.txt.sav accumulator.txt
' _
cat accumulator.txt
Upvotes: 1
Reputation: 26471
In bash or any other shell, glob expansions are done in lexicographical order. When having files numberd, this sadly means that 11.txt < 1.txt < 2.txt
. This weird ordering comes from the fact that, lexicographically, 1 < .
(<dot>-character (".")).
So here are a couple of ways to operate on your files in order:
rename all your files:
for i in *.txt; do mv "$i" "$(sprintf "%0.5d.txt" ${i%.*}"); done
paste *.txt
use brace-expansion:
Brace expansion is a mechanism that allows for the generation of arbitrary strings. For integers you can use {n..m}
to generate all numbers from n
to m
or {n..m..s}
to generate all numbers from n
to m
in steps of s
:
paste {1..2000}.txt
The downside here is that it is possible that a file is missing (eg. 1234.txt
). So you can do
shopt -s extglob; paste ?({1..2000}.txt)
The pattern ?(pattern)
matches zero or one glob-matches. So this will exclude the missing files but keeps the order.
Upvotes: 0
Reputation: 12147
Instead use the wildcard *
to enumerate all your files in a directory, if your file names pattern are sequentially ordered, you can manually list all files in order and concatenate to a large file. The output order of *
enumeration might look different in different environment, as it not works as you expect.
Below is a simple example
$ for i in `seq 20`;do echo $i > $i.txt;done
# create 20 test files, 1.txt, 2.txt, ..., 20.txt with number 1 to 20 in each file respectively
$ cat {1..20}.txt
# show content of all file in order 1.txt, 2.txt, ..., 20.txt
$ cat {1..20}.txt > 1_20.txt
# concatenate them to a large file named 1_20.txt
Upvotes: 0