jawz_34
jawz_34

Reputation: 31

How do I remove non numeric characters from only one column of a .csv file in bash?

I have a large .csv file containing inconsistent data that looks something like this:

O_T_11c,234858DEF8,OHL11_c_234858DEF8.csv,c
Ohl12und,234858DEF8,OHL12_u234858DEF8.csv,u
Ohlabx13und,234858DEF8,OHL13u_234858DEF8.csv,u
Ohl14und,234858DEF8,OHL14u_234858DEF8.csv,u

I want to remove the all letters only from only the first column and leave the rest alone. I've attempted to do this using a regex in awk, but this gives me strange output. I want my output to ultimately be this:

11,234858DEF8,OHL11_c_234858DEF8.csv,c
12,234858DEF8,OHL12_u234858DEF8.csv,u
13,234858DEF8,OHL13u_234858DEF8.csv,u
14,234858DEF8,OHL14u_234858DEF8.csv,u

I understand I can use a sed like this

sed 's/[^0-9]*//g'

To remove all non-numeric characters, but I only want to apply this to the first column.

Upvotes: 1

Views: 3596

Answers (2)

alvits
alvits

Reputation: 6758

Now that we know the actual input is comma delimited, here's the quickest solution to your question.

sed -i 's/^[^0-9]\+//;s/[^0-9,]\+//' largefile.csv

s/^[^0-9]\+// - will remove any non-numeric from the start of the line.

s/[^0-9,]\+// - will remove any non-numeric trailing the numbers just before the comma.

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Short awk approach:

awk -F, '{ gsub(/[^0-9]/,"",$1) }1' OFS=',' file

The output:

11,234858DEF8,OHL11_c_234858DEF8.csv,c
12,234858DEF8,OHL12_u234858DEF8.csv,u
13,234858DEF8,OHL13u_234858DEF8.csv,u
14,234858DEF8,OHL14u_234858DEF8.csv,u

Upvotes: 2

Related Questions