user2647888
user2647888

Reputation: 721

How to split a file into multiple files based on a delimiter, and remove the delimiter also, in Unix

I have a file that somewhat looks like this:

{1:F195}{2:O5350646}{3:{1028:076}}{4:
:16R:GL
:16R:ADD
:19A::P//U9,1
:16S:AFO
-}{5:{MAC:00}{CHK:1C}}{S:{SAC:}{COP:S}{MAN:P2}}${1:33339}{2:O53}{4:
:16S:G
:16R:A
:19A::H0,
:19A::H0,
:16S:ADDINFO
-}{5:{MAC:0}{CHK:4}}{S:{SAC:}{COP:S}{MAN:GP2}}

Now I want to split this single file into two files based on the delimiter $ and then remove the delimiter also. Any help would be greatly appreciated :)

I have used the following logic:

  1. First at every occurrence of $ go to a new line.
  2. I'm able to create multiple files but those are having delimiters.

Code:

FILE=test.dat
sed 's/\$/\n&/g' $FILE > Inter_$FILE 
FILE=Inter_$FILE

cat $FILE | while read line
do
            sleep 1
            FormattedDate=`date +%Y%m%d%H%M%S`
            Final_FILE=New_${FormattedDate}_$FILE

            echo "line --- $line"
            echo "FormattedDate --- $FormattedDate"
            Line_Check=`echo $line | tr '$' '@' |  cut -c1`
            ##Line_Check=`sed -e 's/\$/@/g' $line |  cut -c1`
            echo "Line_Check --- $Line_Check"
            echo "Final_FILE --- $Final_FILE"

            if [ "$Line_Check" = "@" ]
            then
                           Final_FILE=New_$FormattedDate_$FILE
                           FILE=$Final_FILE

                           echo "FOUND In  --- $line"
                           echo "FILE  --->>>  $FILE"

            else
                           FILE=$Final_FILE
                           echo "FILE  --->>>  $FILE"
                           ###`echo $line |  cut -c2-` >>
                           ###cat $line` >> $FILE
                           ###Filter_Line=`echo $line`
                           ###echo "Filter_Line  --- $Filter_Line"
            fi

            echo $line >> $FILE

            ###sed 's/^@//' $FILE > 3_$FILE

done

sed 's/^\$//' $FILE >> Final_$FILE;

Upvotes: 2

Views: 4461

Answers (2)

Steve
Steve

Reputation: 54392

I think you may be trying to reinvent the wheel. awk is a great tool that can be used to split files on delimiters and perform other text processing. You may like to try the following:

awk '{ for(i=1;i<=NF;i++) print $i > "file_" i ".txt" }' RS= FS='\\$' file

Results:

Contents of file_1.txt:

{1:F195}{2:O5350646}{3:{1028:076}}{4:
:16R:GL
:16R:ADD
:19A::P//U9,1
:16S:AFO
-}{5:{MAC:00}{CHK:1C}}{S:{SAC:}{COP:S}{MAN:P2}}

Contents of file_2.txt:

{1:33339}{2:O53}{4:
:16S:G
:16R:A
:19A::H0,
:19A::H0,
:16S:ADDINFO
-}{5:{MAC:0}{CHK:4}}{S:{SAC:}{COP:S}{MAN:GP2}}

Explanation:

Set the Record Separator to null, which puts awk in 'paragraph mode' (by default RS is set to "\n", which enables line-by-line processing). Since your file doesn't look like it contains paragraphs, this will essentially treat your file as a single record. We then set the Field Separator to a dollar-sign character (which needs to be escaped). So for each record (and there should only be one record) we loop over each field (NF is short for Number of Fields) and print it to a file using the iterator. It's worthwhile noting that you will get strange results if your input contains multiple paragraphs. In comparison with Glenn's answer above/below, his solution won't have this problem, but the last file it processes will contain a trailing newline. HTH.

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246744

Use awk, it has a dedicated "input record separator" variable

awk -v RS='$' '{ outfile = "output_file_" NR; print > outfile}' filename      

This program prints each line into a separate file with the line number as a suffix ("output_file_1", "output_file_2").

Upvotes: 3

Related Questions