Reputation: 91
I have a large file with 2000 hostnames and I want to create multiple files with 25 each host per file, but separated by a comma and the last ,
should be removed.
Large.txt:
host1
host2
host3
.
.
host10000
The below-split command is creating multiple files like file1
, file2
... however, the host are not ,
separated and its not the expected output.
split -d -l 25 large.txt file
The expected output is:
host1,host2,host3
Upvotes: 2
Views: 2185
Reputation: 34114
You'll need to perform 2 separate operations ... 1) split
the file and 2) reformat the files generated by split
.
The first step is already done:
split -d -l 25 large.txt file
For the second step let's work with the results that are dumped into the first file by the basic split
command:
$ cat file00
host1
host2
host3
...
host25
We want to pull these lines into a single line using a comma (,
) as delimiter. For this example I'll use an awk
solution:
$ cat file00 | awk '{ printf "%s%s", sep, $0 ; sep="," } END { print "" }'
host1,host2,host3...,host25
Where:
sep
is initially undefined (aka empty string)awk
we set sep
to a commaprintf
doesn't include a linefeed (\n
) so each successive printf
will append to the 'first' line of outputEND
the script by printing a linefeed to the end of the fileIt just so happens that split
has an option to call a secondary script/code-snippet to allow for custom formatting of the output (generated by split
); the option is --filter
. A few issues to keep in mind:
split
is (effectively) piped as input to the command listed in the --filter
optionsplit
command--filter
option automatically has access to the current split
outfile name using the $FILE
variablePulling everything together gives us:
$ split -d -l 25 --filter="awk '{ printf \"%s%s\", sep, \$0 ; sep=\",\" } END { print \"\" }' > \$FILE" large.txt file
$ cat file00
host1,host2,host3...,host25
Upvotes: 1
Reputation: 2177
using awk
:
awk '
BEGIN { PREFIX = "file"; n = 0; }
{ hosts = hosts sep $0; sep = ","; }
function flush() { print hosts > PREFIX n++; hosts = ""; sep = ""; }
NR % 25 == 0 { flush(); }
END { flush(); }
' large.txt
edit: improved comma separation handling stealing from markp-fuso's excellent answer :)
Upvotes: 1
Reputation: 482
you can use below mentioned bash code snippet
INPUT FILE
~$ cat domainlist.txt
domain1.com
domain2.com
domain3.com
domain4.com
domain5.com
domain6.com
domain7.com
domain8.com
Script
#!/usr/bin/env bash
FILE_NAME=domainlist.txt
LIMIT=4
OUTPUT_PREFIX=domain_
CMD="csplit ${FILE_NAME} ${LIMIT} {1} -f ${OUTPUT_PREFIX}"
eval ${CMD}
#=====#
for file in ${OUTPUT_PREFIX}*; do
echo $file
sed -i ':a;N;$!ba;s/\n/,/g' $file
done
OUTPUT
./mysplit.sh
36
48
12
domain_00
domain_01
domain_02
~$ cat domain_00
domain1.com,domain2.com,domain3.com
Change LIMIT, OUTPUT_PREFIX file name prefix and input file as per your requirement
Upvotes: 0
Reputation: 13059
Using the --filter
option on GNU split:
split -d -l 25 --filter="(perl -ne 'chomp; print \",\" if \$i++; print'; echo) > \$FILE" large.txt file
Upvotes: 1