Amit
Amit

Reputation: 23

Shell script - how to extract from line

Hi please help me to find only numbers. My file have only one line data as below:

53-Brand|5556-Color Family|10984-Fit|10313-Combo

Looking for output 53, 5556, 10984, 10313

Thanks


I tried

awk -F',' '{print $2}' /cat_formula       > 1
    53-Brand|5556-Color Family|10984-Fit|10313-Combo

awk -F'|' '{print $1}{print $2}{print $3}{print $4}' 1 >2
    53-Brand
5556-Color Family
10984-Fit
10313-Combo


awk -F'-' '{print $1}' 2
    53
5556
10984
10313

But looking in one command line.

Upvotes: 2

Views: 129

Answers (9)

nitinr708
nitinr708

Reputation: 1467

Sed is your friend:

echo $VALUE | sed -e 's/[^(0-9|)]//g' | sed -e 's/|/, /g'

where VALUE variable contains your input string.

Input: 53-Brand|5556-Color Family|10984-Fit|10313-Combo

Output: 53, 5556, 10984, 10313

Upvotes: 0

Scott Stensland
Scott Stensland

Reputation: 28285

Looking at your input data I see it is nicely chunked by two delimiters ... first the pipe char | and then by char - ... this preliminary first step splits the string on delimiter '|'

echo "53-Brand|5556-Color Family|10984-Fit|10313-Combo" |  xargs -d'|' -i  echo {}
53-Brand
5556-Color Family
10984-Fit
10313-Combo

and for full solution this splits each substring which is now on its own line by delimiter '-'

echo "53-Brand|5556-Color Family|10984-Fit|10313-Combo"|xargs -d'|' -i  echo {}|cut -d '-' -f1
53
5556
10984
10313

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203334

$ awk -F'[-|]' '{for (i=1;i<=NF;i+=2) print $i}' file
53
5556
10984
10313

Most answers you've got so far will fail if/when a digit appears in the text you do not want printed or if/when a non-digit appears in the text you do want printed, the above won't. For example with Brand7 instead of Brand and 53A instead of 53:

$ echo '53A-Brand7|5556-Color Family|10984-Fit|10313-Combo' | awk -F'[-|]' '{for (i=1;i<=NF;i+=2) print $i}'
53A
5556
10984
10313

Upvotes: 1

Claes Wikner
Claes Wikner

Reputation: 1517

echo "53-Brand|5556-Color Family|10984-Fit|10313-Combo"|awk -F'[-|]' '{print $1","$3","$5","$7}'

53,5556,10984,10313

Upvotes: 1

user1934428
user1934428

Reputation: 22225

Assuming your one-line-data-file is input.txt, you can basically achieve what you want by

tr -cs  '|0-9' ' ' <input.txt | tr  '|' ,

The first tr produces the spaces, the second one produces the commas.

However you need to be aware that this outputs no \n at the end. Depending on what you want to do with the result, this might or might not what you want to have. If a trailing newline is importand, you can do for instance

tr -cs  '|0-9' ' ' <input.txt | tr  '|' , ; echo

or the less performant

tr -cs  '|0-9' ' ' <input.txt | tr  '|' , | xargs

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

Considering your Input_file is same as sample shown. Then try following awk once.

awk  -F'[-|]' '{for(i=1;i<=NF;i++){if(i%2!=0){val=val?val "," $i:$i}};print val;val=""}'  Input_file

Explanation: Making - and | as field separators then traversing through all the fields one by one and checking if any field is on ODD position then concatenating it's value to variable named val and out of loop printing it's value and nullifying it.

EDIT: Adding one more solution if Input_file is same as shown sample.

awk '{gsub(/-[a-zA-Z]+\||-[a-zA-Z]+ [a-zA-Z]+\|/,",");sub(/-[a-zA-Z]+$/,"");print}'  Input_file

Upvotes: 1

Rahul Verma
Rahul Verma

Reputation: 3089

grep -oP "\d+" filename

Output:

53
5556
10984
10313

brief explanation:

-P : tells it's a perl regexp
\d+: to match just numbers
-o : to capture just matched numbers

Upvotes: 3

Frank-Rene Sch&#228;fer
Frank-Rene Sch&#228;fer

Reputation: 3352

Using gensub() running by default on the current line, is most likely the most elegant solution:

awk '{ print gensub(/-[^|]+\|?/, " ", "g"); }' tmp.txt

The regular expression /-[^|]+\|/ matches anything starting with - until the optional | (which does not appear at the end of line).

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Two approaches:

-- with grep:

grep -o '[[:digit:]]\+' file

-- with gawk:

awk -v FPAT='[0-9]+' '{ for(i=1;i<=NF;i++) print $i }' file

The output (for both approaches):

53
5556
10984
10313

Upvotes: 2

Related Questions