Reputation: 1139
I have one file with -|
as delimiter after each section...need to create separate files for each section using unix.
example of input file
wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|
Expected result in File 1
wertretr
ewretrtret
1212132323
000232
-|
Expected result in File 2
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
Expected result in File 3
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|
Upvotes: 113
Views: 114534
Reputation: 143
Try this python script:
import os
import argparse
delimiter = '-|'
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input txt")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")
args = parser.parse_args()
counter = 1;
output_filename = 'part-'+str(counter)
with open(args.input_file, 'r') as input_file:
for line in input_file.read().split('\n'):
if delimiter in line:
counter = counter+1
output_filename = 'part-'+str(counter)
print('Section '+str(counter)+' Started')
else:
#skips empty lines (change the condition if you want empty lines too)
if line.strip() :
output_path = os.path.join(args.output_dir, output_filename+'.txt')
with open(output_path, 'a') as output_file:
output_file.write("{0}\n".format(line))
ex:
python split.py -i ./to-split.txt -o ./output-dir
Upvotes: 0
Reputation: 212664
awk '{f="file" NR; print $0 " -|"> f}' RS='-\\|' input-file
Explanation (edited):
RS
is the record separator, and this solution uses a gnu awk extension which allows it to be more than one character. NR
is the record number.
The print statement prints a record followed by " -|"
into a file that contains the record number in its name.
Upvotes: 49
Reputation: 7745
A one liner, no programming. (except the regexp etc.)
csplit --digits=2 --quiet --prefix=outfile infile "/-|/+1" "{*}"
tested on:
csplit (GNU coreutils) 8.30
"For OS X users, note that the version of csplit
that comes with the OS doesn't work. You'll want the version in coreutils (installable via Homebrew), which is called gcsplit
." — @Danial
"Just to add, you can get the version for OS X to work (at least with High Sierra). You just need to tweak the args a bit csplit -k -f=outfile infile "/-\|/+1" "{3}"
. Features that don't seem to work are the "{*}"
, I had to be specific on the number of separators, and needed to add -k
to avoid it deleting all outfiles if it can't find a final separator. Also if you want --digits
, you need to use -n
instead." — @Pebbl
Upvotes: 122
Reputation: 49
The following command works for me. Hope it helps.
awk 'BEGIN{file = 0; filename = "output_" file ".txt"}
/-|/ {getline; file ++; filename = "output_" file ".txt"}
{print $0 > filename}' input
Upvotes: 4
Reputation: 395903
Use csplit
if you have it.
If you don't, but you have Python... don't use Perl.
Your file may be too large to hold in memory all at once - reading line by line may be preferable. Assume the input file is named "samplein":
$ python3 -c "from itertools import count
with open('samplein') as file:
for i in count():
firstline = next(file, None)
if firstline is None:
break
with open(f'out{i}', 'w') as out:
out.write(firstline)
for line in file:
out.write(line)
if line == '-|\n':
break"
Upvotes: 2
Reputation: 1047
Here's a Python 3 script that splits a file into multiple files based on a filename provided by the delimiters. Example input file:
# Ignored
######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END
# Ignored
######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END
Here's the script:
#!/usr/bin/env python3
import os
import argparse
# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'
# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")
args = parser.parse_args()
# read the input file
with open(args.input_file, 'r') as input_file:
input_data = input_file.read()
# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
# discard lines until the next start delimiter
while input_lines and not input_lines[0].startswith(start_delimiter):
input_lines.pop(0)
# corner case: no delimiter found and no more lines left
if not input_lines:
break
# extract the output filename from the start delimiter
output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
output_path = os.path.join(args.output_dir, output_filename)
# open the output file
print("extracting file: {0}".format(output_path))
with open(output_path, 'w') as output_file:
# while we have lines left and they don't match the end delimiter
while input_lines and not input_lines[0].startswith(end_delimiter):
output_file.write("{0}\n".format(input_lines.pop(0)))
# remove end delimiter if present
if not input_lines:
input_lines.pop(0)
Finally here's how you run it:
$ python3 script.py -i input-file.txt -o ./output-folder/
Upvotes: 2
Reputation: 583
I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:
#!/path/to/perl -w
#comment the line below for UNIX systems
use Win32::Clipboard;
# Get command line flags
#print ($#ARGV, "\n");
if($#ARGV == 0) {
print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename. All of the contents of filename.txt are written to that file until another mff is found.\n";
exit;
}
# this package sets the ARGV count variable to -1;
use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);
# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");
while($_ = shift @ARGV) {
if(-f "$_") {
push @filelist, $_;
}
}
# Could be more than one file name on the command line,
# but this version throws away the subsequent ones.
$readfile = $filelist[0];
open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;
while (<SOURCEFILE>) {
/^$mff (.*$)/o;
$outname = $1;
# print $outname;
# print "right is: $1 \n";
if (/^$mff /) {
open OUTFILE, ">$outname" ;
print "opened $outname\n";
}
else {print OUTFILE "$_"};
}
Upvotes: 5
Reputation: 7745
Here is a perl code that will do the thing
#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
print FO $_;
if(/^-\|/)
{
close(FO);
$cur++;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
}
}
close(FO);
Upvotes: 0
Reputation: 2909
This is the sort of problem I wrote context-split for: http://stromberg.dnsalias.org/~strombrg/context-split.html
$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
-s specifies what regex should separate output files
-n specifies how output files are named (default: numeric
-z specifies how long numbered filenames (if any) should be
-i include line containing separator in output files
operations are always performed on stdin
Upvotes: 0
Reputation: 3357
You can also use awk. I'm not very familiar with awk, but the following did seem to work for me. It generated part1.txt, part2.txt, part3.txt, and part4.txt. Do note, that the last partn.txt file that this generates is empty. I'm not sure how fix that, but I'm sure it could be done with a little tweaking. Any suggestions anyone?
awk_pattern file:
BEGIN{ fn = "part1.txt"; n = 1 }
{
print > fn
if (substr($0,1,2) == "-|") {
close (fn)
n++
fn = "part" n ".txt"
}
}
bash command:
awk -f awk_pattern input.file
Upvotes: 3
Reputation: 7032
cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )
and the formated version:
#!/bin/bash
cat FILE | (
I=0;
echo -n"">file0;
while read line;
do
echo $line >> file$I;
if [ "$line" == '-|' ];
then I=$[I+1];
echo -n "" > file$I;
fi;
done;
)
Upvotes: 1
Reputation: 62519
Debian has csplit
, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...
Upvotes: 7