MKLLKM
MKLLKM

Reputation: 21

sentences that have at least 2 numbers or more in SED

Using only SED (ubuntu20.4), I need to print sentences that have at least 2 numbers or more. Then, print only the first two words of the sentence. I was able to perform the second part, but the first goal, I do not know how to perform.

this is the file:

 ab      c1d
dea   1 a zz7 www44
xy12    abc xyz
xy1 ab XYZ
xy ab X2YZ 3

And this is what I've done so far:

sed -E "s/^[ ]*([^ ]+[ ]+[^ ]+).*/\1/" $* > 123

Upvotes: 2

Views: 235

Answers (2)

Simon Dehaut
Simon Dehaut

Reputation: 2677

If you just wanna use sed to print the first 2 words of string that contains at least 2 digits :

sed -nE '/[0-9]{2,}/p' ./yourFile.txt | sed -E 's/^\s*(\S+\s+\S+).*$/\1/'
  • /[0-9]{2,} : strings that contains at least 2 digits
  • /^\s*(\S+\s+\S+).*$ : line that begins with 0 or many space, then capturing a group of (1 or many non space char)(1 or many space char)(1 or many non space char) and then any

EXAMPLE :

input :

 ab      c1d
dea   1 a zz7 www44
xy12    abc xyz
xy1 ab XYZ
xy ab X2YZ 3

output :

dea   1
xy12    abc

and if you want to get rid of multi space char between the first two words of each line you can pipe it one more time into sed :

sed -nE '/[0-9]{2,}/p' ./yourFile.txt 
    | sed -E 's/^\s*(\S+\s+\S+).*$/\1/' 
    | sed -E 's/\s+/ /'
  • s/\s+/ / : s for substistute, \s+ for capturing all consecutive space char, / / for replacing it by just one space char

so in that case output will be :

dea 1
xy12 abc

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

You can use

sed -En '/[0-9][^0-9]*[0-9]/{s/^ *([^ ]+ +[^ ]+).*/\1/p}' file
awk '/[0-9][^0-9]*[0-9]/{print $1" "$2}' file

In both cases, a line with at least two digits is detected with the /[0-9][^0-9]*[0-9]/ regex (digit, any zero or more chars other than digits, a digit), and then in the sed solution, the first two words are captured and the rest is matched and removed, and in the awk solution, only the first two words (that are the first and second fields) are returned concatenated with a space.

See an online demo:

s=' ab      c1d
dea   1 a zz7 www44
xy12    abc xyz
xy1 ab XYZ
xy ab X2YZ 3'
sed -En '/[0-9][^0-9]*[0-9]/{s/^([^[:space:]]+ +[^[:space:]]+).*/\1/p}' <<< "$s"
echo "Now, awk..."
awk '/[0-9][^0-9]*[0-9]/{print $1" "$2}' <<< "$s"

Both return the first words, sed keeps all spaces intact:

dea   1
xy12    abc
xy ab

awk keeps just one:

dea 1
xy12 abc
xy ab

Upvotes: 0

Related Questions