Fernando Bonetti
Fernando Bonetti

Reputation: 27

Sed is not writing to file

I wanna simply change the delimiter on my CSV. The file comes from a outside server, so the delimiter is something like this: ^A.

name^Atype^Avalue^A
john^Ab^A500
mary^Ac^A400
jack^Ad^A200

I want to get this:

name,type,value
john,b,500
mary,c,400
jack,d,200

I need to change it to a comma(,) or a tab(,), but my sed command, despite correctly output, does not write the file.

cat -v CSVFILE | sed -i "s/\^A/,/g"

When i use the line above, it correctly outputs the file delimited by a comma instead of ^A, but it doesn't write to the file.

I also tried like this:

sed -i "s/\^A/,/g" CSVFILE

Does not work also... What am i doing wrong?

Upvotes: 1

Views: 2228

Answers (4)

mklement0
mklement0

Reputation: 439307

  • Literal ^A (two characters, ^ and A) is how cat -v visualizes control character 0x1 (ASCII code 1, named SOH (start of heading)). ^A is an example of caret notation to represent unprintable ASCII characters:

    • ^A stands for keyboard combination Control-A, which, when preceded by generic escape sequence Control-V, is how you can create the actual control character in your terminal; in other words,
      Control-VControl-A will insert an actual 0x1 character.

    • Incidentally, the logic of caret notation (^<letter>) is: the letter corresponds to the ASCII value of the control character represented; e.g., A corresponds to 0x1, and D corresponds to 0x4 (^D, EOT).
      To put it differently: you add 0x40 to the ASCII value of the control character to get the ASCII value of its letter representation in caret notation.
      ^@ to represent NUL (0x0 characters) and ^? to represent DEL (0x7f) are consistent with this notation, because @ has ASCII value 0x40 (i.e., it comes just before A (0x41) in the ASCII table) and 0x40 + 0x7f constrained to 7 bits (bit-ANDed with the max. ASCII value 0x7f) yields 0x3f, which is the ASCII value of ?.

    • To inspect a given file for the ASCII values of exotic control characters, you can pipe it to od -c, which represents 0x1 as (octal) 001.

  • This implies that, when passing the file to sed directly, you cannot use caret notation and must instead use the actual control character in your s call.

    • Note that when you use Control-VControl-A to create an actual 0x1 character, it will also appear in caret notation - as ^A - but in that case it is just the terminal's visualization of the true control character; while it may look like the two printable characters ^ and A, it is not. Purely visually you cannot tell the difference - which is why using an escape sequence or ANSI C-quoted string to represent the control character is the better choice - see below.
  • Assuming your shell is bash, ksh, or zsh, the better alternative to using Control-VControl-A is to use an ANSI C-quoted string to generate the 0x1 character: $'\1'

    • However, as Lars Fischer points out in a comment on the question, GNU sed also recognizes escape sequence \x01 for 0x1.

Thus, your command should be:

sed -i 's/\x01/,/g' CSVFILE    # \x01 only recognized by GNU sed

or, using an ANSI C-quoted string:

sed -i $'s/\1/,/g' CSVFILE  

Note: While this form can in principle be used with BSD/OSX sed, the -i syntax is slightly different: you'd have to use sed -i '' $'s/\1/,/g' CSVFILE


The only reason to use sed for your task is to take advantage of in-place updating (-i); otherwise, tr is the better choice - see Ed Morton's answer.

Upvotes: 3

SLePort
SLePort

Reputation: 15461

In case it's run under OS X :

  • Add an extension to the -i to write in a new file :

    sed -i.bak "s/^A/,/g" CSVFILE
    
  • Or to write in place :

    sed -i '' "s/^A/,/g" CSVFILE
    
  • You can also output to file with a cat but without -i on your sed command :

    cat -v CSVFILE | sed "s/^A/,/g" > ouput
    

Make sure you write the ^A this way :

Ctrl+V+Ctrl+A

Upvotes: 1

peak
peak

Reputation: 116870

If your sed supports the -i option, you could use it like this:

sed -i.bak -e "s/\^A/,/g" CSVFILE

(This assumes the delimiter in the source file consists of the two characters ^ and A; if ^A is supposed to refer to Control-A, then you will have to make adjustments accordingly, e.g. using 's/\x01/,/g'.)

Otherwise, assuming you want to keep a copy of the original file (e.g. in case the result is not what you expect -- see below), an incantation such as the following can be used:

mv CSVFILE CSVFILE.bak  &&  sed "s/\^A/,/g" CSVFILE.bak > CSVFILE

As pointed out elsewhere, if the source-file separator is Control-A, you could also use tr '\001' , (or tr '\001' '\t' for a tab).

The caution is that the delimiter in the source file might well be used precisely because commas might appear in the "values" that the separator-character is separating. If that is a possibility, then a different approach will be needed. (See e.g. https://www.rfc-editor.org/rfc/rfc4180)

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204015

This is the job tr was created to do:

tr '<control-A>' ',' < file > tmp && mv tmp file

Replace <control-A> with a literal control-A obviously.

Upvotes: 1

Related Questions