JohnnyOnPc
JohnnyOnPc

Reputation: 444

AWK - add value based on regex

I have to add the numbers returned by REGEX using awk in linux.

Basically from this file:

123john456:x:98:98::/home/john123:/bin/bash

I have to add the numbers 123 and 456 using awk. So the result would be 579

So far I have done the following:

awk -F ':' '$1 ~ VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd

awk -F ':' 'VAR+="/[0-9].*(?=:)/" ; {print VAR}' /etc/passwd

awk -F ':' 'match($1, VAR=/[0-9].*?:/) ; {print VAR}' /etc/passwd

And from what I've seen match doesn't support this at all.

Does someone has any idea?

UPDATE: it also should work for john123 result - > 123 123john result - > 123

Upvotes: 0

Views: 235

Answers (8)

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2807

echo '
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
john:x:98:98::/home/john123:/bin/bash'    | 

awk '($0 += $2)_' FS='[^0-9]+(:.+)?'

579
123
123
0

This runs on just about any awk you can find.

Upvotes: 0

anubhava
anubhava

Reputation: 784878

Here is another awk variant that adds all the numbers present in first field separated by ::

cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
1j2o3h4n5:x:98:98::/home/john123:/bin/bash

awk -F '[^0-9:]+' '{s=0; for (i=1; i<=NF; i++) {s+=$i; if ($i~/:$/) break} print s}' file

579
123
123
15

Upvotes: 0

stack0114106
stack0114106

Reputation: 8711

You can try Perl also

$ cat johnny.txt
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash

$ perl -F: -lane ' $_=$F[0]; $sum+= $1 while(/(\d+)/g); print $sum; $sum=0 ' johnny.txt
579
123
123

$

Upvotes: 0

blhsing
blhsing

Reputation: 106435

You can use [^0-9]+ as a field separator, and :[^\n]*\n as a record separator instead:

awk -F '[^0-9]+' 'BEGIN{RS=":[^\n]*\n"}{print $1+$2}' /etc/passwd

so that given the content of /etc/passwd being:

123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash

This outputs:

579
123
123

Upvotes: 0

vintnes
vintnes

Reputation: 2030

I used awk's split() to separate the first field on any string not containing numbers.

split(string, target_array, [regex], [separator_array]*)

*separator_array requires gawk

$ awk -F: '{split($1, A, /[^0-9]+/, S); print S[1], A[1]+A[2]}' <<EOF
123john456:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash
EOF

john 579
john 123

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203129

$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579

With your updated requirements:

$ cat file
123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash
123john:x:98:98::/home/john123:/bin/bash

$ awk -F':' '{split($1,t,/[^0-9]+/); print t[1] + t[2]}' file
579
123
123

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You may use

awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' /etc/passwd

See online awk demo

s="123john456:x:98:98::/home/john123:/bin/bash
john123:x:98:98::/home/john123:/bin/bash"
awk -F ':' '{n=split($1, a, /[^0-9]+/); b=0; for (i=1;i<=n;i++) { b += a[i]; }; print b; }' <<< "$s"

Output:

579
123

Details

  • -F ':' - records are split into fields with : char
  • n=split($1, a, /[^0-9]+/) - gets Field 1 and splits into digit only chunks saving the numbers in a array and the n var contains the number of these chunks
  • b=0 - b will hold the sum
  • for (i=1;i<=n;i++) { b += a[i]; } - iterate over a array and sum the values
  • print b - prints the result.

Upvotes: 0

F. Knorr
F. Knorr

Reputation: 3055

With gawk and for the given example

awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); print a}' inputFile | bc

would do the job. More general:

awk -F ':' '{a=gensub(/[a-zA-Z]+/,"+", "g", $1); a=gensub(/^+/,"","g",a); a=gensub(/+$/,"","g",a); print a}' inputFile | bc

The regex-part replaces all sequences of letters with '+' (e.g., '12johnny34' becomes 12+34). Finally, this mathematical operation is evaluated by bc. (The be safe, I remove leading and trailing '+' sings by ^+ and +$)

Upvotes: 0

Related Questions