Reputation: 4008
I have many text files of fixed-width data, e.g.:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.07826 -3.0
15.104348 -4.0
15.130435 -5.0
15.156522 -6.0
15.182609 -6.9999995
15.208695 -8.0
The data comprise 3 or 4 runs of a simulation, all stored in the one text file, with no separator between runs. In other words, there is no empty line or anything, e.g. if there were only 3 'records' per run it would look like this for 3 runs:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
It's a COMSOL Multiphysics output file for those interested. Visually you can tell where the new run data begin, as the first x-value is repeated (actually the entire second line might be the same for all of them). So I need to firstly open the file and get this x-value, save it, then use it as a pattern to match with awk or csplit. I am struggling to work this out!
csplit will do the job:
$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\\s/ {*}
but I have to know the pattern to split on. This question is similar but each of my text files might have a different pattern to match: Split files based on file content and pattern matching.
Ben.
Upvotes: 3
Views: 6360
Reputation: 86774
Here's a simple awk script that will do what you want:
BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$1 }
$1 == delim {
f=sprintf("test%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }
Upvotes: 3
Reputation: 1181
If the amount of lines per run is constant, you could use this:
cat your_file.txt | grep -P "^\d" | \
split --lines=$(expr \( $(wc -l "your_file.txt" | \
awk '{print $1'}) - 1 \) / number_of_runs)
Upvotes: 0
Reputation: 28628
This should do the job - test somewhere you don't have a lot of temp*.txt
files: :)
rm -f temp*.txt
cat > f1.txt <<EOF
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
EOF
first=`awk 'NR==2{print $1}' f1.txt|sed 's/\\./\\\\./'`
echo --- Splitting by: $first
csplit -z -f temp -b %02d.txt f1.txt /^"$first"\\s/ {*}
for i in temp*.txt; do
echo ---- $i
cat $i
done
The output of the above is:
--- Splitting by: 15\.0
51
153
153
136
---- temp00.txt
% x y
---- temp01.txt
15.0 0.0
15.026087 -1.0
15.052174 -2.0
---- temp02.txt
15.0 0.0
15.038486 -1.0
15.066712 -2.0
---- temp03.txt
15.0 0.0
15.041089 -1.0
15.087612 -2.0
Of course, you will run into trouble if you have repeating second column value (15.0
in the above example) - solving that would be a tad harder - exercise left for the reader...
Upvotes: 1