user1955215
user1955215

Reputation: 743

AWK Multilines into Single line record

The data below has to be converted into string

01 |
0101001001 |
DD-01-001-001-001/57 |
1 |
Vijay Raghavan |
 |
3096 |
Govind Industries |
 |
 |
 |
 |
 |
 |
 |
  </EmployeeData>


with the code below (in a .awk file) :

#BEGIN {FS ="\n" ; RS="[</EmployeeData>]"}
#{
#for (i=1; i<=NF; i++)
#print $i","
#}

There is no output. Please help. Thanks in advance.

Upvotes: 1

Views: 2936

Answers (4)

Ed Morton
Ed Morton

Reputation: 203502

# is the awk comment-start character. Every line in your posted .awk file is commented out, hence no output. Also, RS="[</EmployeeData>]" does not set RS to the string </EmployeeData> as I suspect was desired, it sets it to any of the characters in the list < / E m p l o y e D a t courtesy of the character-list delimiters [].

I think this is probably what you really are looking for (uses GNU awk for multi-char RS):

$ cat file
01 |
0101001001 |
DD-01-001-001-001/57 |
1 |
Vijay Raghavan |
 |
3096 |
Govind Industries |
 |
 |
  </EmployeeData>
02 |
0202002002 |
DD-01-001-001-001/57 |
1 |
Bob Shmobswort |
 |
1234 |
Some Other Places |
 |
 |
  </EmployeeData>

.

$ cat tst.awk
BEGIN{FS="[[:space:]]*[|][[:space:]]*"; OFS=","; RS="</EmployeeData>[[:space:]]*"}
{ $1=$1; print }

.

$ awk -f tst.awk file
01,0101001001,DD-01-001-001-001/57,1,Vijay Raghavan,,3096,Govind Industries,,,
02,0202002002,DD-01-001-001-001/57,1,Bob Shmobswort,,1234,Some Other Places,,,

Upvotes: 0

Kent
Kent

Reputation: 195059

 awk -v RS="" '{$1=$1}7'  file

the above line will merge all lines into one, including the </EmployeeData>

Upvotes: 2

Jotne
Jotne

Reputation: 41456

Try this awk

awk -F"\n" -v RS="</EmployeeData>" '{$1=$1}1' file
01 | 0101001001 | DD-01-001-001-001/57 | 1 | Vijay Raghavan |  | 3096 | Govind Industries |  |  |  |  |  |  |  |

If you like , as separator do:

awk -F"\n" -v RS="</EmployeeData>" '{$1=$1;gsub(/ \| /,",")}1' file
01,0101001001,DD-01-001-001-001/57,1,Vijay Raghavan,,3096,Govind Industries,,,,,,,,

Upvotes: 2

Simon
Simon

Reputation: 10841

There were a couple of potential problems. First, the value of RS that was used did not match the text that it was intended to match (which was why there was no output). Second, print automatically puts a newline at the end of the text that it prints, so the output would be on multiple lines anyway.

The following script solves both problems:

BEGIN {FS ="\n" ; RS="</EmployeeData>"}
{ 
    for (i=1; i<=NF; i++)
        printf "%s,",$i;
    printf "\n";
}

Upvotes: 1

Related Questions