Reputation: 21
I have a CSV file where I need to replace the occurrence of a double quote followed by a line feed with a string i.e. "XXXX"
I've tried the following:
LC_CTYPE=C && LANG=C && sed 's/\"\n/XXXX/g' < input_file.csv > output_file.csv
and
LC_CTYPE=C && LANG=C && sed 's/\"\n\r/XXXX/g' < input_file.csv > output_file.csv
also tried
sed 's/\"\n\r/XXXX/g' < input_file.csv > output_file.csv
In each case, the command does not seem to recognize the specific combination of "\n in the file
It works if I look for just the double quote:
sed 's/\"/XXXX/g' < input_file.csv > output_file.csv
and if I look for just the line feed:
sed 's/\n\r/XXXX/g' < input_file.csv > output_file.csv
But no luck with the find-replace for the combined regex string
Any guidance would be most appreciated.
Adding simplified sample data
Sample input data (header row and two example records):
column1,column2
data,data<cr>
data,data"<cr>
Sample output:
column1,column2
data,data<cr>
data,dataXXXX
Update: Having some luck using perl commands in bash (MacOS) to get this done:
perl -pe 's/\"/XXXX/' input.csv > output1.csv
then
perl -pe 's/\n/YYYY/' output1.csv > output2.csv
this results in XXXXYYYY at the end of each record
I'm sure there is an easier way, but this seems to be doing the trick on a test file I've been using. Trying it out there before I use on the original 200K-line csv file.
Upvotes: 1
Views: 145
Reputation: 204558
sed is for simple substitutions on individual lines, that is all, so this is not a job for sed.
It sounds like this is what you want (uses GNU awk for multi-char RS):
$ awk -v RS='"\n' -v ORS='XXXX' '1' file
column1,column2
data,data
data,dataXXXX$
That final $
above is my prompt, demonstrating that both the "
and the subsequent newline have been replaced.
Upvotes: 3
Reputation: 89639
You can try something like this:
sed ':a;/"\r\?$/{N;s/"\r\?\n\|"\r\?$/XXXX/;ba;}'
details:
:a # define the label "a"
/"\r\?$/ # condition: if the line ends with " then:
{
N # add the next line to the pattern space
s/ # replace:
"\r\?\n # the " and the LF (or CRLF)
\|
"\r\?$ # or a " at the end of the added line
# (this second alternative is only tested at the end
# of the file)
/XXXX/ # with XXXX
ba # go to label a
}
Upvotes: 1