Reputation: 1522

Split file based on delimiter and then join into separate lines

I have a file, example.txt

0
   A
   B
   C, C, C
   D, D
   E
   F
1
   A, A, A
   B
   C
2
   A
   B
   C
   D, D, D
   E

I need to separate the file based on any number and then take the contents between those numbers and join them into a single line, repeating the process for every section of the file:

A, B, C, C, C, D, D, E, F
A, A, A, B, C
A, B, C, D, D, D, E

The best I've come up with is:

cat example.txt | sed -e '1,/^[0-9]/d' -e '/^[0-9]/,$d' | paste -sd "," -

A, A, A,   B,   C

which is only the middle section, in this case. That, or printing all sections onto one line.

Upvotes: 2

Answers (5)

ctac_

Reputation: 2491

Another sed

sed -n '
N
:A
$bB
/\n[ ]*[0-9][0-9]*$/!{
N
bA
}
h
s/\n[^\n]*$//
:B
s/[^\n]*\n[ ]*//
s/\n[ ]*/, /g
p
$b
x
s/.*\n//
bA
' infile

Upvotes: 1

George Vasiliou

Reputation: 6345

A shorter idiomatic awk alternative:

$ awk '$1=$1{printf "%s%s",$0,(RT==","?OFS:ORS)}' RS="[0-9]|," OFS=", " file1
A, B, C, C, C, D, D, E, F
A, A, A, B, C
A, B, C, D, D, D, E

RS is the Record Separator . Default is new line, here is set to number or comma
OFS is the Output Field Separator = a comma with a single space
RT is the record separator value in use
ORS is the Output Record Separator , new line by default
$1=$1 is an idiomatic assignment that forces awk to recalculate fields and records based on the values of OFS,ORS,etc
(RT==","?OFS:ORS) Is a ternary if operation based on the synthax
(condition?action for true:action for false).

Upvotes: 5

Bach Lien

Reputation: 1060

sed:

 echo `sed 's:$:,:' example.txt` | sed -r 's:^:, :;s:,\s*[0-9]+,\s*:\n:g;s:^\s*::;s:,? *$::'

perl:

 perl -p0777e 's:^:, :;s:\n\s*:, :g;s:,\s*[0-9]+,\s*:\n:g;s:^\s*::;s:,?\s*$:\n:' example.txt

echo..., or perl -p0777... - treat whole file as a single long line (containing newline characters (perl) or spaces (echo))
s:^:, : - add an extra comma in the beginning
s:\n:,:g - replace all newlines with commas
s:,\*s[0-9]+,\s*:\n:g - replace all numbers surrounded by commans with newlines

Upvotes: 2

RavinderSingh13

Reputation: 133770

Following awk may also help in same.

awk '/^[0-9]+/ && val{print val;val="";next} FNR>1{sub(/^ +/,"");val=val?val ", " $0:$0} END{print val}'  Input_file

Explanation: Adding explanation too here for above command with it's non-one liner form too now.

awk '
/^[0-9]+/ && val{        ##Checking condition here if a line starts from digit(s) and variable named val is NOT NULL if it is TRUE then do following:
  print val;             ##printing the value of variable val here.
  val="";                ##Nullifying the variable val here.
  next                   ##next will skip all further coming statements.
}
FNR>1{                   ##Checking condition here if line number is greater than 1 then do following:
  sub(/^ +/,"");         ##Using sub utility of awk to substitute all starting space with NULL of the current line.
  val=val?val ", " $0:$0 ##creating variable named val and concatenating its own value with it each time it comes here.
}
END{                     ##This is awk programs end section here. Which starts once whole Input_file is being read.
  print val              ##Printing the variable named val value here.
}
'  Input_file            ##Mentioning the Input_file name here.

Upvotes: 2

John1024

Reputation: 113994

Try:

$ awk 'function prn(line) {if(line){gsub(/[[:space:]]+/, " ", line); print line}}  /^[0-9]/{prn(line); line=""; next} {if(line)line=line"," $0; else line=$0} END{prn(line)}' example.txt
 A, B, C, C, C, D, D, E, F
 A, A, A, B, C
 A, B, C, D, D, D, E

Or, for those who prefer code spread over multiple lines:

awk 'function prn(line)
      {
          if(line){
              gsub(/[[:space:]]+/, " ", line)
              print line
           }
       }

       /^[0-9]/{
           prn(line)
           line=""
           next
       }

       {
           if(line)
               line=line"," $0
           else
               line=$0
       }

       END{
           prn(line)
       }' example.txt

How it works

function prn(line) {if(line){gsub(/[[:space:]]+/, " ", line); print line}}

This defines a function prn which compresses excess spaces and prints the line.
/^[0-9]/{prn(line); line=""; next}

If the current line starts with a number, call prn on the contents of line, reset line back to an empty string, and skip the rest of the commands and instead jump to the next line.
{if(line)line=line"," $0; else line=$0}

Add the current line to the end of the variable line.
END{prn(line)}

After we have reached the end of the file, call prn on line.

Upvotes: 2

Split file based on delimiter and then join into separate lines

Answers (5)

How it works

Related Questions