manas
manas

Reputation: 471

pasting many files to a single large file

i have many text files in a directory like 1.txt 2.txt 3.txt 4.txt .......2000.txt and i want to paste them to make a large file.

In this regard i did something like

paste *.txt > largefile.txt

but the above command reads the .txt file randomly, so i need to read the files sequentially and paste as 1.txt 2.txt 3.txt....2000.txt please suggest a better solution for pasting many files. Thanks and looking forward to hearing from you.

Upvotes: 0

Views: 434

Answers (3)

KamilCuk
KamilCuk

Reputation: 140960

Sort the file names numerically yourself then.

printf "%s\n" *.txt | sort -n | xargs -d '\n' paste

When dealing with many files, you may hit ulimit -n. On my system ulimit -n is 1024, but this is a soft limit and can be raised with just like ulimit -n 99999.

Without raising the soft limit, go with a temporary file that would accumulate results each "round" of ulimit -n count of files, like:

touch accumulator.txt
... | xargs -d '\n' -n $(($(ulimit -n) - 1)) sh -c '
       paste accumulator.txt "$@" > accumulator.txt.sav;
       mv accumulator.txt.sav accumulator.txt
' _
cat accumulator.txt

Upvotes: 1

kvantour
kvantour

Reputation: 26471

In bash or any other shell, glob expansions are done in lexicographical order. When having files numberd, this sadly means that 11.txt < 1.txt < 2.txt. This weird ordering comes from the fact that, lexicographically, 1 < . (<dot>-character (".")).

So here are a couple of ways to operate on your files in order:

rename all your files:

for i in *.txt; do mv "$i" "$(sprintf "%0.5d.txt" ${i%.*}"); done
paste *.txt

use brace-expansion:

Brace expansion is a mechanism that allows for the generation of arbitrary strings. For integers you can use {n..m} to generate all numbers from n to m or {n..m..s} to generate all numbers from n to m in steps of s:

paste {1..2000}.txt

The downside here is that it is possible that a file is missing (eg. 1234.txt). So you can do

shopt -s extglob; paste ?({1..2000}.txt)

The pattern ?(pattern) matches zero or one glob-matches. So this will exclude the missing files but keeps the order.

Upvotes: 0

alijandro
alijandro

Reputation: 12147

Instead use the wildcard * to enumerate all your files in a directory, if your file names pattern are sequentially ordered, you can manually list all files in order and concatenate to a large file. The output order of * enumeration might look different in different environment, as it not works as you expect.

Below is a simple example

$ for i in `seq 20`;do echo $i > $i.txt;done
# create 20 test files, 1.txt, 2.txt, ..., 20.txt with number 1 to 20 in each file respectively
$ cat {1..20}.txt
# show content of all file in order 1.txt, 2.txt, ..., 20.txt
$ cat {1..20}.txt > 1_20.txt
# concatenate them to a large file named 1_20.txt

Upvotes: 0

Related Questions