Reputation: 91

Bash split command to split line in comma separated values

I have a large file with 2000 hostnames and I want to create multiple files with 25 each host per file, but separated by a comma and the last , should be removed.

Large.txt:

host1
host2
host3
.
.
host10000

The below-split command is creating multiple files like file1, file2 ... however, the host are not , separated and its not the expected output.

split -d -l 25 large.txt file

The expected output is:

host1,host2,host3

Upvotes: 2

Answers (4)

markp-fuso

Reputation: 34114

You'll need to perform 2 separate operations ... 1) split the file and 2) reformat the files generated by split.

The first step is already done:

split -d -l 25 large.txt file

For the second step let's work with the results that are dumped into the first file by the basic split command:

$ cat file00
host1
host2
host3
...
host25

We want to pull these lines into a single line using a comma (,) as delimiter. For this example I'll use an awk solution:

$ cat file00 | awk '{ printf "%s%s", sep, $0 ; sep="," } END { print "" }'
host1,host2,host3...,host25

Where:

sep is initially undefined (aka empty string)
on each successive line processed by awk we set sep to a comma
the printf doesn't include a linefeed (\n) so each successive printf will append to the 'first' line of output
we END the script by printing a linefeed to the end of the file

It just so happens that split has an option to call a secondary script/code-snippet to allow for custom formatting of the output (generated by split); the option is --filter. A few issues to keep in mind:

the initial output from split is (effectively) piped as input to the command listed in the --filter option
it is necessary to escape (with backslash) certain characters in the command (eg, double quotes, dollar sign) so as to keep them from being interpreted by the split command
the --filter option automatically has access to the current split outfile name using the $FILE variable

Pulling everything together gives us:

$ split -d -l 25 --filter="awk '{ printf \"%s%s\", sep, \$0 ; sep=\",\" } END { print \"\" }' > \$FILE" large.txt file
$ cat file00
host1,host2,host3...,host25

Upvotes: 1

MarcoLucidi

Reputation: 2177

using awk:

awk '
BEGIN            { PREFIX = "file"; n = 0; }
                 { hosts = hosts sep $0; sep = ","; }
function flush() { print hosts > PREFIX n++; hosts = ""; sep = ""; }
NR % 25 == 0     { flush(); }
END              { flush(); }
' large.txt

edit: improved comma separation handling stealing from markp-fuso's excellent answer :)

Upvotes: 1

amit bhosale

Reputation: 482

you can use below mentioned bash code snippet

INPUT FILE

~$ cat domainlist.txt
domain1.com
domain2.com
domain3.com
domain4.com
domain5.com
domain6.com
domain7.com
domain8.com

Script

#!/usr/bin/env bash

FILE_NAME=domainlist.txt
LIMIT=4
OUTPUT_PREFIX=domain_
CMD="csplit ${FILE_NAME} ${LIMIT} {1} -f ${OUTPUT_PREFIX}"
eval ${CMD}
#=====#
for file in ${OUTPUT_PREFIX}*; do
    echo $file
    sed -i ':a;N;$!ba;s/\n/,/g' $file
done

OUTPUT

./mysplit.sh 

36
48
12
domain_00
domain_01
domain_02
~$ cat domain_00
domain1.com,domain2.com,domain3.com

Change LIMIT, OUTPUT_PREFIX file name prefix and input file as per your requirement

Upvotes: 0

alani

Reputation: 13059

Using the --filter option on GNU split:

split -d -l 25 --filter="(perl -ne 'chomp; print \",\" if \$i++; print'; echo) > \$FILE" large.txt file

Upvotes: 1

Bash split command to split line in comma separated values

Answers (4)

Related Questions