Reputation: 1
I have a CSV file which contains double quote inside the data.
EI_SS EI_HDR,"Test FileReport, for" Testing" EI_DT,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,tx,dt8,tx EI_COL,"Carrier, Name","Carrier ID","Account Name","Account ID","Group Name","Group ID","Action Code","File ID","Contract","PBP ID","Response Status","Claim Number","Claim Seq","HICN","Cardholder ID","Date of Service","NDC" "Test Carrier" ,"MPDH5427" ,"1234" ,"CSNP" ,"TestD" Test" ,"FH14077" ,"" ,"PD14079042" ,"H5427" ,"077" ,"REJ" ,"133658279751004" ,"999" ,"304443938A" ,"P0002067501" ,01/01/2014,"50742010110" ,"C" "Test, Carrier1" ,"BCRIMA" ,"Carrier" ,"14" ,"123333" ,"00000MCA0014001" ,"" ,"PD14024142" ,"H4152" ,"013" ,"REJ" ,"133658317280023" ,"999" ,"035225520A" ,"ZBM200416667" ,01/01/2014,"00378350505"
The Updated Actual CSV
Now I want to remove the inner quotes from these data but need to keep outer double quotes for each data.
For processing file, I have used "\"[a-zA-Z0-9 ]+[,][ a-zA-Z0-9]+\""
pattern to split the file. But if there is any inner quote in any row then the code breaks.
I need to convert this into XLSX by keeping comma's and replacing inner quotes (if not possible then removing those inner quotes.
Please help me to solve this issue.
Upvotes: 0
Views: 473
Reputation: 93
If your data is having just one double quote: ,"abc "def", - Following should help:
test.txt "abc","def"gh","ijk" "lmn","o"pq","rst"
sed -i 's/([^,])\"([^,])/\1\"\"\2/g' test.txt
Command above looks for a set of 3 characters which matches a pattern - ?"? where ? is anything but a comma. Implies - search for 3 characters which is not like ,", and replace " with ""
Command split: ([^,]) - character that is not a comma - () are for remembering this character \" - Double quote \1 - First character which is remembered \2 - Second character which is remembered.
Note: This does not work if you have two double quotes in the encapsulated. The above command does not escape " in ,"a"b"cc",
Hope this helps a bit.
Upvotes: 0
Reputation: 2771
first of all why don't you use the proper char of regex ?
actually there is a char : \w
which means [a-zA-Z_0-9]
instead of your [a-zA-Z0-9]
(quite the same just adding _ but much more readable I think ^^)
For your pattern, as other said, the best way is to correct the way you generate the csv at first ;)
Upvotes: 0
Reputation: 8068
I think it is not possible, because the way you demarcate two values is ambiguous. For example, how to split the following value?
""I am", "a single", ", value""
Is it meant to be:
I am
a single
, value
or
I am
a single, , value
or even
I am, a single, , value
?
Upvotes: 2