Reputation: 1
I have a comma delimited text file where one of the columns (appropriately) has text encased with double quotes. There are also many instances of double quotes within the content of this particular column. I've used the following to remove many of the double quotes, replacing them with single quotes (excluding any double quotes next to a comma).
(?<!^)(?<![,])"(?![,])(?!$)
How do I isolate/replace the double quote after [fine,] without removing the "good" double quotes?
column1,"he's doing 'fine," says Tom, but nothing specific. Blah, blah, blah", column3
Here is another example of "good" double quotes that I don't want to remove (where the first two columns are blank/empty)
,,"This is text I need",
Upvotes: 0
Views: 1813
Reputation: 976
Not familiar with Notepad++, but reading other answers I assume there is a way to use regex. If so, you can use this one:
(?<!^|",)"(?!,"|$)
Upvotes: 0
Reputation: 31
Struggled with this a bit, but based on your question, there might be a possible solution. If you only have one column which has unescaped quotes or commas, you might be able to count the commas in front of that column and the commas after that column then strip all the quotes and commas between them. If you have multiple columns with unescaped characters, this might be harder.
Upvotes: 0
Reputation: 14047
Assuming that double quotes only occur in one column then I suggest a two-step approach. First change all double quotes in the file to single quotes, using a simple replace all. Next change the first and last single quotes back to double quotes. This can be done in one regex, replace (^[^\r\n']*)'(.*)'(^[^\r\n']*)$
with \1"\2"\3
.
If single quotes occur in other columns and see should not be altered then a three-step approach can be used. Choose a character that does not occur anywhere in the text. Change all double quotes to that character, I will use !
as an example. As above, change the first and last !
to double quotes. This can be done in one regex, replace (^[^\r\n']*)!(.*)!(^[^\r\n']*)$
with \1"\2"\3
. Finally change all the !
to single quotes. If you cannot find an unused character then you can use a longer string that is not in the file instead, perhaps something like _<<abc>>_
instead of the !
.
Upvotes: 0