arnpry
arnpry

Reputation: 1131

Remove Letter After Number and Before Comma

I need to remove any letters that occur after the first comma in a line

some.file

JAN,334X,333B,337A,338D,332Q,335H,331U

Expected Result:

JAN,334,333,337,338,332,335,331

Code:

sed -i 's/\[0-9][0-9][0-9].*,/[0-9][0-9][0-9],/g' some.file

What am I doing wrong?

Upvotes: 2

Views: 445

Answers (6)

Thor
Thor

Reputation: 47089

No need for sed, coreutils will do:

paste -d, <(cut -d, -f1 data) <(cut -d, -f2- data | tr -d 'A-Z')

This takes .3 seconds on my computer when run on the data file generated in ceving's answer.

Upvotes: 2

ceving
ceving

Reputation: 23774

Try this

$ sed 's/,\([0-9]*\)[^,]*/,\1/g' <<<'JAN,334X,333B,337A,338D,332Q,335H,331U'
JAN,334,333,337,338,332,335,331

You need to capture the digits with round parenthesis in order to use the captured string in the replacement. The option g does this for every occurrence.

Comparison of the different answers

Test data:

$ > data; for ((x=1000000;x>0;x--)); do echo 'JAN,334X,333B,337A,338D,332Q,335H,331U' >> data; done

My answer is the slowest:

$ time sed 's/,\([0-9]*\)[^,]*/,\1/g' < data >/dev/null

real    0m16.368s
user    0m16.296s
sys     0m0.024s

Michael is a bit faster:

$ time sed ':;s/[A-Z],/,/2;t;s/[A-Z]$//' < data >/dev/null

real    0m9.669s
user    0m9.624s
sys     0m0.012s

But Sundeep is the fastet:

$ time sed 's/[A-Z]//4g' < data >/dev/null

real    0m4.905s
user    0m4.856s
sys     0m0.028s

Upvotes: 2

Sundeep
Sundeep

Reputation: 23667

Since question is tagged linux, this GNU sed option comes in handy

$ echo 'JAN,334X,333B,337A,338D,332Q,335H,331U' | sed -E 's/[A-Z](,|$)/\1/2g'
JAN,334,333,337,338,332,335,331
  • 2g means replace from 2nd match onwards till end of line

If number of letters is known for first column, this can be simplified to

$ echo 'JAN,334X,333B,337A,338D,332Q,335H,331U' | sed 's/[A-Z]//4g'
JAN,334,333,337,338,332,335,331

Upvotes: 2

Michael Vehrs
Michael Vehrs

Reputation: 3363

You could also use a small loop (this is GNU sed);

sed ':;s/[A-Z],/,/2;t;s/[A-Z]$//'

It only deletes the second letter preceding a comma, and loops. Finally, it deletes the letter at the line's end, if there is one.

Upvotes: 4

robert
robert

Reputation: 4867

You should omit the * and the first \ looks like a mistake i.e.

sed -i 's/[0-9][0-9][0-9].,/[0-9][0-9][0-9],/g' some.file

but I think you also want to capture the number ...

sed -i 's/\([0-9][0-9][0-9]\).,/\1,/g' some.file

Would be helpful if you posted your actual output as well ...

Upvotes: 2

sat
sat

Reputation: 14949

Some issues are:

  • No need to escape [.

  • Your replace value is wrong. Ex: s/regex/replace/g

Use this:

sed -e 's/\([0-9]\+\)[a-zA-Z],/\1,/g' -e 's/\([0-9]\+\)[a-zA-Z]$/\1/g' file

Upvotes: 2

Related Questions