Phantom
Phantom

Reputation: 7

Replace delimiter in the file

Let's take

"apple" , "tomato,potato", 100, 250, "banana" 

is my data I want replace the outer delimiter i.e. , which is same as inner delimiter. Ultimately, want my data like :

"apple" ; "tomato,potato"; 100; 250; "banana"

and the position of the string with delimeter can be anywhere in data.

I have tried many chatGPT sed commands all are not working

Upvotes: -5

Views: 317

Answers (7)

Cyril Chaboisseau
Cyril Chaboisseau

Reputation: 478

Here is one solution with DuckDB on a larger example (2 lines)

$ cat /tmp/test.csv 
"apple" , "tomato,potato", 100, 250, "banana"
"grapes","peas,beans",150,300,kiwi

the result from the DuckDB prompt

D COPY (from '/tmp/test.csv') TO '/dev/stdout' (HEADER 0, delimiter ';');
apple;tomato,potato;100;250;banana
grapes;peas,beans;150;300;kiwi

You can also use read_csv function for enhanced CSV parsing (RFC 4180 compliant)

$ duckdb << EoS
COPY (from read_csv('/tmp/test.csv')) TO '/tmp/test_semicolumn.csv' (HEADER 0, delimiter ';');
EoS

will return the same result on a new file

Upvotes: 0

Southernal
Southernal

Reputation: 167

The problem here is probably that you are missing the "inplace" parameter -i.

sed -i -E 's/( *\, *)((([^,"\,])+)|("[^"]+"))/; \2/g' nameofthefile.csv 

this will substitute all occurencies of "," with ";" in the file without change them if they are inside quotes. Please change directory to the path of the file.

If you want to do this replacement for multiple files csv in the same path use this command:

    sed -i -E 's/( *\, *)((([^,"\,])+)|("[^"]+"))/; \2/g' *.csv  

Upvotes: -2

potong
potong

Reputation: 58518

This might work for you (GNU sed):

sed -E ':a;s/^(("[^,"]*"[^"]*)*"[^,"]*),/\1\n/;ta;y/\n,/,;/' file

First convert any commas inside double quotes to newlines.

Then transliterate newlines to commas and commas to semi-colons.

An alternate solution (kudos to Walter A):

sed -E 's/(("[^"]*"|)[^",]*),/\1;/g' file

Upvotes: 2

Walter A
Walter A

Reputation: 20032

When the input has an even number of double quotes, you can match a quoted field with ("[^"]*") and the following substring until the next , with ([^",]*).
When you also want to replace delimiters between fields without quotes, add an * after the regex for a quoted field.

echo '"apple" , "tomato,potato", 100, 250, "banana" ' | sed -E 's/("[^"]*")*([^",]*),/\1\2;/g'
# or with a file
sed -E 's/("[^"]*")*([^",]*),/\1\2;/g' file

Result

"apple" ; "tomato,potato"; 100; 250; "banana"

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 204381

I wouldn't normally answer a question that shows no attempt by the OP but since there are multiple answers posted already....

Your input isn't valid CSV and so YMMV trying to use a tool that understands CSV on it.

Using any awk in any shell on every Unix box you could treat the input as "-separated fields instead of ,-separated fields and replace the ,s in every odd-numbered "field":

$ awk 'BEGIN{FS=OFS="\""} {for (i=1; i<=NF; i+=2) gsub(/,/,";",$i)} 1' file
"apple" ; "tomato,potato"; 100; 250; "banana"

If that's not all you need then see whats-the-most-robust-way-to-efficiently-parse-csv-using-awk for ideas if you want to use awk for this.

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247092

Some implementations of CSV parsers may have trouble with that "sloppy" CSV. While the spaces around commas are OK, the fact that the quotes are not immediately next to the comma isn't OK. For example,

  • the ruby CSV module throws a MalformedCSVError "Any value after quoted field isn't allowed"

    $ echo '"apple" , "tomato,potato", 100, 250, "banana"' | ruby -rcsv -e 'p CSV.parse(STDIN.readline)'
    /home/linuxbrew/.linuxbrew/Cellar/ruby/3.3.2/lib/ruby/3.3.0/csv/parser.rb:1066:in `parse_quotable_robust': Any value after quoted field isn't allowed in line 1. (CSV::MalformedCSVError)
    
  • python:

      $ python3
      Python 3.12.3 (main, Apr  9 2024, 08:09:14) [GCC 11.4.0] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import csv
      >>> line = '"apple" , "tomato,potato", 100, 250, "banana"'
      >>> for fields in csv.reader([line]):
      ...   print(fields)
      ...
      ['apple ', ' "tomato', 'potato"', ' 100', ' 250', ' "banana"']
    
  • the bash5.2 dsv module does the same thing as python: the "tomato,potato" value is not considered one field because there is a space before the opening quote, so the quotes don't have special meaning.

Upvotes: 1

Raymond
Raymond

Reputation: 17

You can simply replace , with ; using this sed command sed 's/, /; /g'.

for example:

echo '"apple" , "tomato,potato", 100, 250, "banana"' | sed 's/, /; /g'                                                                                     

output:

"apple" ; "tomato,potato"; 100; 250; "banana"

Upvotes: -1

Related Questions