Michele
Michele

Reputation: 681

delete all weird characters from text file

I'm trying to use sed command to clean a txt file:

sed -i.bak -e 's@^[A-Za-z0-9_.;,:]+$@@g' *.txt

returns

sed: RE error: illegal byte sequence

What am I doing wrong with regular exp? Normally I'm saying "replace all that isn't A-Za-z0-9_.;,:" with ""

Upvotes: 0

Views: 449

Answers (3)

Michele
Michele

Reputation: 681

Glenn Jackman was right, the solution found in the other post helped...

the only problem is the command now only knows english latin characters so won't work...

Here's the result, nothing changed:

ÁÉc†ÿ°“Å9,0,sub,,0,0,0,,Pero, aun no comprendo porque quer√≠a acabar conÄC∂u⁄ÁÉx¨†ú°ñÅ996,0,sub,,0,0,0,,õÇ–†µ°ØÅ*10,0,sub,,0,0,0,,Ha deshonrado aléC∂u⁄ÁÉ©≤†”°ÕÅ11,0,sub,,0,0,0,,{\pos(1481.142,795.974)\bord0\fad(800,0)}Himalayan RangeõÇ!¸C∂u@óÁÉf†”°ÕÅ12,0,sub,,0,0,0,,¬øEsta seguro que querer hacerlo solo?, se√±or MitsumazaõÇ»†ª°µÅî13,0,sub,,0,0,0,,Silencio Tatsumi, tranquil√≠zateõÇ2C∂u@ôÁÉ,†©°£Å14,0,sub,,0,0,0,,Pero se√±or...õÇ\†≠°ßÅ\15,0,sub,,0,0,0,,Aunque lo digas...õÇ<†∏°≤Åò16,0,sub,,0,0,0,,Tengo un esp√≠ritu aventureroõÇ|C∂u@£ÁÉ@†∞°™Å17,0,sub,,0,0,0,,Lo entiendo se√±or...õÇ–†≤°¨Å–18,0,sub,,0

Upvotes: 0

repzero
repzero

Reputation: 8412

Let say you have something like this in a file named "my_file"

Location: http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB [following]
--2014-12-10 21:25:44--  http://www.google.gy/?gws_rd=cr&ei=l_KIVOXnIsinNq2NgsgB
Resolving www.google.gy (www.google.gy)... 64.233.176.94, 2607:f8b0:4002:c05::5e
Connecting to www.google.gy (www.google.gy)|64.233.176.94|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html.2'

You can try

sed -i.bak -e 's#[^[:alnum:].;,:]##g'  'my_file'

This will find characters that are not alphanumeric or "."or ";"or ","or ":" and print. Results

Location:http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgBfollowing
2014121021:25:44http:www.google.gygwsrdcreilKIVOXnIsinNq2NgsgB
Resolvingwww.google.gywww.google.gy...64.233.176.94,2607:f8b0:4002:c05::5e
Connectingtowww.google.gywww.google.gy64.233.176.94:80...connected.
HTTPrequestsent,awaitingresponse...200OK
Length:unspecifiedtexthtml
Savingto:index.html.2

Upvotes: 0

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185106

You put the ^ @ a bad place, put it there :

sed -i.bak -e 's@[^A-Za-z0-9_\.;,:]\+$@@g' *.txt

And not the little changes (backslashing some special chars)

Upvotes: 1

Related Questions