Reputation: 27
I wanna simply change the delimiter on my CSV. The file comes from a outside server, so the delimiter is something like this: ^A.
name^Atype^Avalue^A
john^Ab^A500
mary^Ac^A400
jack^Ad^A200
I want to get this:
name,type,value
john,b,500
mary,c,400
jack,d,200
I need to change it to a comma(,) or a tab(,), but my sed command, despite correctly output, does not write the file.
cat -v CSVFILE | sed -i "s/\^A/,/g"
When i use the line above, it correctly outputs the file delimited by a comma instead of ^A, but it doesn't write to the file.
I also tried like this:
sed -i "s/\^A/,/g" CSVFILE
Does not work also... What am i doing wrong?
Upvotes: 1
Views: 2228
Reputation: 439307
Literal ^A
(two characters, ^
and A
) is how cat -v
visualizes control character 0x1
(ASCII code 1
, named SOH
(start of heading)). ^A
is an example of caret notation to represent unprintable ASCII characters:
^A
stands for keyboard combination Control-A, which, when preceded by generic escape sequence Control-V, is how you can create the actual control character in your terminal; in other words,
Control-VControl-A will insert an actual 0x1
character.
Incidentally, the logic of caret notation (^<letter>
) is: the letter corresponds to the ASCII value of the control character represented; e.g., A
corresponds to 0x1
, and D
corresponds to 0x4
(^D
, EOT
).
To put it differently: you add 0x40
to the ASCII value of the control character to get the ASCII value of its letter representation in caret notation.
^@
to represent NUL
(0x0
characters) and ^?
to represent DEL
(0x7f
) are consistent with this notation, because @
has ASCII value 0x40
(i.e., it comes just before A
(0x41
) in the ASCII table) and 0x40 + 0x7f
constrained to 7 bits (bit-ANDed with the max. ASCII value 0x7f
) yields 0x3f
, which is the ASCII value of ?
.
To inspect a given file for the ASCII values of exotic control characters, you can pipe it to od -c
, which represents 0x1
as (octal) 001
.
This implies that, when passing the file to sed
directly, you cannot use caret notation and must instead use the actual control character in your s
call.
0x1
character, it will also appear in caret notation - as ^A
- but in that case it is just the terminal's visualization of the true control character; while it may look like the two printable characters ^
and A
, it is not. Purely visually you cannot tell the difference - which is why using an escape sequence or ANSI C-quoted string to represent the control character is the better choice - see below.Assuming your shell is bash
, ksh
, or zsh
, the better alternative to using Control-VControl-A is to use an ANSI C-quoted string to generate the 0x1
character: $'\1'
sed
also recognizes escape sequence \x01
for 0x1
.Thus, your command should be:
sed -i 's/\x01/,/g' CSVFILE # \x01 only recognized by GNU sed
or, using an ANSI C-quoted string:
sed -i $'s/\1/,/g' CSVFILE
Note: While this form can in principle be used with BSD/OSX sed
, the -i
syntax is slightly different: you'd have to use sed -i '' $'s/\1/,/g' CSVFILE
The only reason to use sed
for your task is to take advantage of in-place updating (-i
); otherwise, tr
is the better choice - see Ed Morton's answer.
Upvotes: 3
Reputation: 15461
In case it's run under OS X :
Add an extension to the -i
to write in a new file :
sed -i.bak "s/^A/,/g" CSVFILE
Or to write in place :
sed -i '' "s/^A/,/g" CSVFILE
You can also output to file with a cat but without -i
on your sed
command :
cat -v CSVFILE | sed "s/^A/,/g" > ouput
Make sure you write the ^A this way :
Ctrl+V+Ctrl+A
Upvotes: 1
Reputation: 116870
If your sed
supports the -i option, you could use it like this:
sed -i.bak -e "s/\^A/,/g" CSVFILE
(This assumes the delimiter in the source file consists of the two characters ^ and A; if ^A is supposed to refer to Control-A, then you will have to make adjustments accordingly, e.g. using 's/\x01/,/g'
.)
Otherwise, assuming you want to keep a copy of the original file (e.g. in case the result is not what you expect -- see below), an incantation such as the following can be used:
mv CSVFILE CSVFILE.bak && sed "s/\^A/,/g" CSVFILE.bak > CSVFILE
As pointed out elsewhere, if the source-file separator is Control-A, you could also use tr '\001' ,
(or tr '\001' '\t'
for a tab).
The caution is that the delimiter in the source file might well be used precisely because commas might appear in the "values" that the separator-character is separating. If that is a possibility, then a different approach will be needed. (See e.g. https://www.rfc-editor.org/rfc/rfc4180)
Upvotes: 1
Reputation: 204015
This is the job tr
was created to do:
tr '<control-A>' ',' < file > tmp && mv tmp file
Replace <control-A>
with a literal control-A obviously.
Upvotes: 1