JochenDB
JochenDB

Reputation: 587

Filter numeric values from certain columns using bash

I have files that look like this

1;1;Happy Feet;WB;€546,353;-32.3%;;121;-;€4,515;€2,841,113;5;Australia, USA;:
2;3;The Departed;WB;€435,830;-34.8%;;85;-;€5,127;€1,149,495;2;Unknown;:
3;2;Eragon;Fox;€412,229;-41.6%;;90;-;€4,580;€1,752,715;3;UK, USA;:
....

These files get in by using tail to cut of the first 6 lines:

sudo tail -n+7 filename

Is it possible to only retain the numeric values from column 5, 10 and 11 and replace those columns with the integer values? I was thinking about awk and sed but I have absolutely no experience with these tools..

The goal would be to do this all in one command and write a file that looks like this:

1;1;Happy Feet;WB;546353;-32.3%;;121;-;4515;2841113;5;Australia, USA;:
2;3;The Departed;WB;435830;-34.8%;;85;-;5127;1149495;2;Unknown;:
3;2;Eragon;Fox;412229;-41.6%;;90;-;4580;1752715;3;UK, USA;:

Upvotes: 1

Views: 460

Answers (2)

devnull
devnull

Reputation: 123608

You could use awk:

awk -F';' '{gsub("[^0-9.]", "", $5);gsub("[^0-9.]", "", $10);gsub("[^0-9.]", "", $11)}1' OFS=';' inputfile

For your input, it'd produce:

1;1;Happy Feet;WB;546353;-32.3%;;121;-;4515;2841113;5;Australia, USA;:
2;3;The Departed;WB;435830;-34.8%;;85;-;5127;1149495;2;Unknown;:
3;2;Eragon;Fox;412229;-41.6%;;90;-;4580;1752715;3;UK, USA;:

EDIT: A somewhat idiomatic way to do the same using awk would be to make use of an array to keep the indexes:

awk -F';' 'BEGIN{split("5,10,11",a,",")}{for(i in a){gsub("[^0-9]","",$a[i])}}1' OFS=';' inputfile

Upvotes: 2

anubhava
anubhava

Reputation: 785581

You can use awk:

awk -F';' '{gsub(/[^0-9]/, "", $5); gsub(/[^0-9]/, "", $10); 
            gsub(/[^0-9]/, "", $11);} 1' OFS=';' file

1;1;Happy Feet;WB;546353;-32.3%;;121;-;4515;2841113;5;Australia, USA;:
2;3;The Departed;WB;435830;-34.8%;;85;-;5127;1149495;2;Unknown;:
3;2;Eragon;Fox;412229;-41.6%;;90;-;4580;1752715;3;UK, USA;:

Upvotes: 1

Related Questions