Reputation: 2715
I have multiple (more than 100) text files in the directory such as
files_1_100.txt
files_101_200.txt
The contents of the file are name of some variables like files_1_100.txt
contains some variables names between 1 to 100
"var.2"
"var.5"
"var.15"
Similarly files_201_300.txt
contains some variables between 101 to 200
"var.203"
"var.227"
"var.285"
and files_1001_1100.txt
as
"var.1010"
"var.1006"
"var.1025"
I can merge them using the command
cat files_*00.txt > ../all_files.txt
However, the contents of files does not follow that in the parent files. For example all_files.txt
shows
"var.1010"
"var.1006"
"var.1025"
"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"
So, how can I ensure that contents of files_1_100.txt
comes first, followed by files_201_300.txt
and then files_1001_1100.txt
such that the contents of the all_files.txt
is
"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"
"var.1010"
"var.1006"
"var.1025"
Upvotes: 3
Views: 3039
Reputation: 3646
You could also do this with Awk by splitting and sorting ARGV
:
awk 'BEGIN {
for(i=1; i<=ARGC-1; i++) {
if(i > 1) {
j=i-1
split(ARGV[i], curr, "_")
split(ARGV[j], last, "_")
if (curr[2] < last[2]) {
tmp=ARGV[i]
ARGV[i]=ARGV[j]
ARGV[j]=tmp
}
}
}
}1' files_*00.txt
Upvotes: 0
Reputation: 899
You could try using a for-loop and adding the files one by one (the -v sorts the files correctly when the numbers are not zero-padded)
for i in $(ls -v files_*.txt)
do
cat $i >> ../all_files.txt
done
or more convenient in a single line:
for i in $(ls -v files_*.txt) ; do cat $i >> ../all_files.txt ; done
Upvotes: 0
Reputation: 3093
Let me try it out, but I think that this will work:
ls file*.txt | sort -n -t _ -k2 -k3 | xargs cat
The idea is to take your list of files and sort them and then pass them to the cat command.
The sort uses several options:
You have said that your files are named file_1_100.txt, file_101_201.txt, etc. If that means (as it seems to indicate) that the first numeric "chunk" is always unique then you can leave off the -k3
flag. That flag is needed only if you will end up, for instance, with file_100_2.txt and file_100_10.txt where you have to look at the 2nd numeric "chunk" to determine the preferred order.
Depending on the number of files you are working with you may find that specifying the glob (file*.txt) may overwhelm the shell and cause errors about the line being too long. If that's the case you could do it like this:
ls | grep '^file.*\.txt$' | sort -n -t _ -k2 -k3 | xargs cat
Upvotes: 2
Reputation: 20980
If your filenames are free from any special characters or white-spaces, then other answers should be easy solutions.
Else, try this rename
based approach:
$ ls files_*.txt
files_101_200.txt files_1_100.txt
$ rename 's/files_([0-9]*)_([0-9]*)/files_000$1_000$2/;s/files_0*([0-9]{3})_0*([0-9]{3})/files_$1_$2/' files_*.txt
$ ls files_*.txt
files_100_100.txt files_101_200.txt
$ cat files_*.txt > outputfile.txt
$ rename 's/files_0*([0-9]*)_0*([0-9]*)/files_$1_$2/' files_*.txt
Upvotes: 1
Reputation: 784998
You can use printf
sort
and pipe that to xargs cat
:
printf "%s\0" f*txt | sort -z -t_ -nk2 | xargs -0 cat > ../all_files.txt
Note that whole pipeline is working on NULL terminated filenames thus making sure this command even works foe filenames with space/newlines etc.
Upvotes: 2
Reputation: 2863
The default sorting behavior of cat file_*
is alphabetical, rather than numeric.
List them in numerical order and then cat each one, appending the output to some file.
ls -1| sort -n |xargs -i cat {} >> file.out
Upvotes: 0