Mauricio Calvao
Mauricio Calvao

Reputation: 475

reformatting and cleaning CSV file with curly braces matching across distinct lines

I would like to reformat a pure ascii file, test.txt, containing (just a sample of 10 lines out of several hundred):

{0.91, 0.87, -69.79, 
  -0.3149, 0.05}, {0.9392, 
      1.089, 69, -0.31, 
      0.052}, {-0.8768, 0.7025, 
      69.80, -0.314, 0.053}, 
     {0.930, -1.2638750861516, 69.79, 
      0.314, 0.05301}, {0.9367, 
      -1.368063705085268, 69.79962, -0.31, 
      0.052}, {0.946, -1.644, 
      69.7, 0.3, 0.052}

to a final file, test_processed.txt, containing (for the same sample):

0.91, 0.87, -69.79, -0.3149, 0.05 
0.9392, 1.089, 69, -0.31, 0.052 
-0.8768, 0.7025, 69.80, -0.314, 0.053 
0.930, -1.2638750861516, 69.79, 0.314, 0.05301 
0.9367, -1.368063705085268, 69.79962, -0.31, 0.052} 
0.946, -1.644, 69.7, 0.3, 0.052

That is, a plain CSV file, with each line containing exactly the five fields within the original pairs of matched braces.

I tried to fiddle a little with gawk and regex'es, but wasn't able to figure out how to manage this; I have the feeling that tweaking with awk's variables RS and ORS might help, but could not forge ahead...

Upvotes: 2

Views: 132

Answers (3)

anubhava
anubhava

Reputation: 785581

Using gnu-awk, you may use this awk using RS to match anything between{...} and then remove starting {, ending } and newlines:

awk -v RS='{[^}]+}' 'RT{gsub(/^{|}$|\n */, "", RT); print RT}' file
0.91, 0.87, -69.79, -0.3149, 0.05
0.9392, 1.089, 69, -0.31, 0.052
-0.8768, 0.7025, 69.80, -0.314, 0.053
0.930, -1.2638750861516, 69.79, 0.314, 0.05301
0.9367, -1.368063705085268, 69.79962, -0.31, 0.052
0.946, -1.644, 69.7, 0.3, 0.052

How it works:

  • -v RS='{[^}]+}': Sets record separator a match for {...}
  • RT: Checks if RT is not empty. RT is set as the string from input, matched by RS pattern.
  • {...} is action block in awk
  • gsub(/^{|}$|\n */, "", RT): Removes starting {, ending } and line break followed by 0 or more spaces from RT
  • print RT: prints modified RT

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 204015

With GNU awk for multi-char RS and RT:

$ awk -v RS='{[^}]+}' 'RT{$0=RT; gsub(/[{}]/,""); $1=$1; print}' file
0.91, 0.87, -69.79, -0.3149, 0.05
0.9392, 1.089, 69, -0.31, 0.052
-0.8768, 0.7025, 69.80, -0.314, 0.053
0.930, -1.2638750861516, 69.79, 0.314, 0.05301
0.9367, -1.368063705085268, 69.79962, -0.31, 0.052
0.946, -1.644, 69.7, 0.3, 0.052

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133630

Could you please try following with GNU awk, written and tested with shown samples.

awk -v RS="" -v FS="[}{]" '{for(i=2;i<=NF;i+=2){gsub(/\n+ +/," ",$i);print $i}}' Input_file

output will be as follows.

0.91, 0.87, -69.79,  -0.3149, 0.05
0.9392,  1.089, 69, -0.31,  0.052
-0.8768, 0.7025,  69.80, -0.314, 0.053
0.930, -1.2638750861516, 69.79,  0.314, 0.05301
0.9367,  -1.368063705085268, 69.79962, -0.31,  0.052
0.946, -1.644,  69.7, 0.3, 0.052

Upvotes: 1

Related Questions