akp
akp

Reputation: 91

Bash split command to split line in comma separated values

I have a large file with 2000 hostnames and I want to create multiple files with 25 each host per file, but separated by a comma and the last , should be removed.

Large.txt:

host1
host2
host3
.
.
host10000

The below-split command is creating multiple files like file1, file2 ... however, the host are not , separated and its not the expected output.

split -d -l 25 large.txt file

The expected output is:

host1,host2,host3

Upvotes: 2

Views: 2185

Answers (4)

markp-fuso
markp-fuso

Reputation: 34114

You'll need to perform 2 separate operations ... 1) split the file and 2) reformat the files generated by split.

The first step is already done:

split -d -l 25 large.txt file

For the second step let's work with the results that are dumped into the first file by the basic split command:

$ cat file00
host1
host2
host3
...
host25

We want to pull these lines into a single line using a comma (,) as delimiter. For this example I'll use an awk solution:

$ cat file00 | awk '{ printf "%s%s", sep, $0 ; sep="," } END { print "" }'
host1,host2,host3...,host25

Where:

  • sep is initially undefined (aka empty string)
  • on each successive line processed by awk we set sep to a comma
  • the printf doesn't include a linefeed (\n) so each successive printf will append to the 'first' line of output
  • we END the script by printing a linefeed to the end of the file

It just so happens that split has an option to call a secondary script/code-snippet to allow for custom formatting of the output (generated by split); the option is --filter. A few issues to keep in mind:

  • the initial output from split is (effectively) piped as input to the command listed in the --filter option
  • it is necessary to escape (with backslash) certain characters in the command (eg, double quotes, dollar sign) so as to keep them from being interpreted by the split command
  • the --filter option automatically has access to the current split outfile name using the $FILE variable

Pulling everything together gives us:

$ split -d -l 25 --filter="awk '{ printf \"%s%s\", sep, \$0 ; sep=\",\" } END { print \"\" }' > \$FILE" large.txt file
$ cat file00
host1,host2,host3...,host25

Upvotes: 1

MarcoLucidi
MarcoLucidi

Reputation: 2177

using awk:

awk '
BEGIN            { PREFIX = "file"; n = 0; }
                 { hosts = hosts sep $0; sep = ","; }
function flush() { print hosts > PREFIX n++; hosts = ""; sep = ""; }
NR % 25 == 0     { flush(); }
END              { flush(); }
' large.txt

edit: improved comma separation handling stealing from markp-fuso's excellent answer :)

Upvotes: 1

amit bhosale
amit bhosale

Reputation: 482

you can use below mentioned bash code snippet

INPUT FILE

~$ cat domainlist.txt
domain1.com
domain2.com
domain3.com
domain4.com
domain5.com
domain6.com
domain7.com
domain8.com

Script

#!/usr/bin/env bash

FILE_NAME=domainlist.txt
LIMIT=4
OUTPUT_PREFIX=domain_
CMD="csplit ${FILE_NAME} ${LIMIT} {1} -f ${OUTPUT_PREFIX}"
eval ${CMD}
#=====#
for file in ${OUTPUT_PREFIX}*; do
    echo $file
    sed -i ':a;N;$!ba;s/\n/,/g' $file
done

OUTPUT

./mysplit.sh 

36
48
12
domain_00
domain_01
domain_02
~$ cat domain_00
domain1.com,domain2.com,domain3.com

Change LIMIT, OUTPUT_PREFIX file name prefix and input file as per your requirement

Upvotes: 0

alani
alani

Reputation: 13059

Using the --filter option on GNU split:

split -d -l 25 --filter="(perl -ne 'chomp; print \",\" if \$i++; print'; echo) > \$FILE" large.txt file

Upvotes: 1

Related Questions