Chirag Raj
Chirag Raj

Reputation: 35

Unix command to remove newline within a field

EDIT: Thank you all for answering the question, all the answers work. Stack Overflow indeed has a great community.

Receiving a flat file as a source. In one of the fields, the value is segregated into new lines, but there is a need to break the newline and combine it into a single content.

Ex: File is as below:

PO,MISC,yes,"This
is
an
example"
PO,MISC,yes,"This
is
another
example"

In the above ex, the data is being read as 9 lines, but we need the input to be read as a single line, as shown below -

PO, MISC, yes, "This is an example"
PO, MISC, yes, "This is another example"

Tried via the below syntax but did not succeed. Is there any way to achieve this? I also need to print the file contents into another file.

Syntax:

awk -v RS='([^,]+\\,){4}[^,]+\n' '{gsub(/\n/,"",RT); print RT}' sample_attachments.csv > test.csv

Upvotes: 1

Views: 939

Answers (7)

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10123

If sed is allowed, then

sed ':a
     /^[^"]*\("[^"]*"[^"]*\)*$/b
     N
     s/\n/ /
     ba
' file

Upvotes: 1

petrus4
petrus4

Reputation: 614

Your input file name is assumed to be "file" here, and output is "newfile."

#!/bin/sh -x

cp file stack
cat > ed1 <<EOF
1,4w f1
1,4d
wq
EOF

next () {
[[ -s stack ]] && main
end
}

main () {
ed -s stack < ed1
cat f1 | tr '\n' ' ' >> newfile
next
}

end () {
rm -v ./ed1
rm -v ./f1
rm -v ./stack
}

next

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203522

Using any awk:

$ awk -v RS='"' -v ORS= '!(NR%2){gsub(/\n/,OFS); $0="\"" $0 "\""} 1' file
PO,MISC,yes,"This is an example"
PO,MISC,yes,"This is another example"

For anything else, see whats-the-most-robust-way-to-efficiently-parse-csv-using-awk.

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133518

With your shown samples Only, please try following awk, written and tested in GNU awk. Simple explanation would be, setting RS to \"\n and setting field separator as ,. In main block Globally substituting new lines with spaces in $NF. Then using printf printing current line along with value of RT.

awk -v RS="\"\n"  'BEGIN{FS=OFS=","} {gsub(/\n/," ",$NF);printf("%s",$0 RT)}' Input_file

Upvotes: 3

anubhava
anubhava

Reputation: 785156

You may use this gnu-awk solution:

awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", RT)} 1' file

field1, field2, field3, field4
PO,MISC,yes,"This is an example"
PO,MISC,no,"This is  another example"

Where input file is this:

cat file
field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"
PO,MISC,no,"This
is
another
example"

For the updated question use this awk:

awk -F, -v OFS=", " -v RS='"[^"]*"|\n' '{
   ORS = gensub(/\n(.)/, " \\1", "g", RT)
   $1 = $1
} 1' file

field1,  field2,  field3,  field4
PO, MISC, yes, "This is an example"
PO, MISC, no, "This is  another example"

Upvotes: 2

Daweo
Daweo

Reputation: 36450

I would harness GNU AWK for this task following way, let file.txt content be

field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"

then

awk 'BEGIN{RS="";FPAT=".";OFS=""}{for(i=1;i<=NF;i+=1){cnt+=($i=="\"");if($i=="\n"&&cnt%2){$i=" "}};print}' file.txt

gives output

field1, field2, field3, field4
PO,MISC,yes,"This is an example"

Assumptions: there is never more than 1 newline in succession, " are never nested, Explanation: I inform GNU AWK to enter paragraph more, that is treat everything between blank lines as one row and that field pattern is ., i.e. every character is field and that output field separator is empty string. Then I iterate over characters, if I encounter " I increase cnt by 1, which is used for dead-reckoning if I am outside "..." or inside "...", when I encounter newline character and cnt is odd I am inside so I swap that for space character. After all character are processed I print them.

(tested in gawk 4.2.1)

Upvotes: 2

Luuk
Luuk

Reputation: 14929

awk -F"," '
   BEGIN{ getline; n=NF; print}
   { split($0,a,FS); 
     while(length(a)<=n){ 
       s=$0; 
       getline; 
       $0=s " " $0; 
       split($0,a,FS); 
     } 
     print $0 }' sample_attachements.txt
  • BEGIN(....) store the number of fields in the variable n
  • while the number of fields (length of array a) is unequal n, read another line, and append it to the input.
  • print $0 finally print the (modified)input line

Upvotes: 2

Related Questions