discipulus
discipulus

Reputation: 2715

Merge multiple files preserving the original sequence in unix

I have multiple (more than 100) text files in the directory such as

files_1_100.txt
files_101_200.txt

The contents of the file are name of some variables like files_1_100.txt contains some variables names between 1 to 100

"var.2"
"var.5"
"var.15"

Similarly files_201_300.txt contains some variables between 101 to 200

"var.203"
"var.227"
"var.285"

and files_1001_1100.txt as

"var.1010"
"var.1006"
"var.1025"

I can merge them using the command

cat files_*00.txt > ../all_files.txt

However, the contents of files does not follow that in the parent files. For example all_files.txt shows

"var.1010"
"var.1006"
"var.1025"
"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"

So, how can I ensure that contents of files_1_100.txt comes first, followed by files_201_300.txt and then files_1001_1100.txt such that the contents of the all_files.txt is

"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"
"var.1010"
"var.1006"
"var.1025"

Upvotes: 3

Views: 3039

Answers (6)

John B
John B

Reputation: 3646

You could also do this with Awk by splitting and sorting ARGV:

awk 'BEGIN {
    for(i=1; i<=ARGC-1; i++) {
        if(i > 1) {
            j=i-1
            split(ARGV[i], curr, "_")
            split(ARGV[j], last, "_")
            if (curr[2] < last[2]) {
                tmp=ARGV[i]
                ARGV[i]=ARGV[j]
                ARGV[j]=tmp
            }
        }
    }
}1' files_*00.txt

Upvotes: 0

Slizzered
Slizzered

Reputation: 899

You could try using a for-loop and adding the files one by one (the -v sorts the files correctly when the numbers are not zero-padded)

for i in $(ls -v files_*.txt)
do
    cat $i >> ../all_files.txt
done

or more convenient in a single line:

for i in $(ls -v files_*.txt) ; do cat $i >> ../all_files.txt ; done

Upvotes: 0

Peter Bowers
Peter Bowers

Reputation: 3093

Let me try it out, but I think that this will work:

ls file*.txt | sort -n -t _ -k2 -k3 | xargs cat

The idea is to take your list of files and sort them and then pass them to the cat command.

The sort uses several options:

  • -n - use a numeric sort rather than alphabetic
  • -t _ - divide the input (the filename) into fields using the underscore character
  • -k2 -k3 - sort first by the 2nd field and then by the 3rd field (the 2 numbers)

You have said that your files are named file_1_100.txt, file_101_201.txt, etc. If that means (as it seems to indicate) that the first numeric "chunk" is always unique then you can leave off the -k3 flag. That flag is needed only if you will end up, for instance, with file_100_2.txt and file_100_10.txt where you have to look at the 2nd numeric "chunk" to determine the preferred order.

Depending on the number of files you are working with you may find that specifying the glob (file*.txt) may overwhelm the shell and cause errors about the line being too long. If that's the case you could do it like this:

ls | grep '^file.*\.txt$' | sort -n -t _ -k2 -k3 | xargs cat

Upvotes: 2

anishsane
anishsane

Reputation: 20980

If your filenames are free from any special characters or white-spaces, then other answers should be easy solutions. Else, try this rename based approach:

$ ls files_*.txt
files_101_200.txt  files_1_100.txt

$ rename  's/files_([0-9]*)_([0-9]*)/files_000$1_000$2/;s/files_0*([0-9]{3})_0*([0-9]{3})/files_$1_$2/' files_*.txt

$ ls files_*.txt
files_100_100.txt  files_101_200.txt

$ cat files_*.txt > outputfile.txt

$ rename 's/files_0*([0-9]*)_0*([0-9]*)/files_$1_$2/' files_*.txt

Upvotes: 1

anubhava
anubhava

Reputation: 784998

You can use printf sort and pipe that to xargs cat:

printf "%s\0" f*txt | sort -z -t_ -nk2 | xargs -0 cat > ../all_files.txt

Note that whole pipeline is working on NULL terminated filenames thus making sure this command even works foe filenames with space/newlines etc.

Upvotes: 2

user1717259
user1717259

Reputation: 2863

The default sorting behavior of cat file_* is alphabetical, rather than numeric.

List them in numerical order and then cat each one, appending the output to some file.

ls -1| sort -n |xargs -i cat {} >> file.out

Upvotes: 0

Related Questions