Micael Illos
Micael Illos

Reputation: 492

Sed print only lines with at least 2 words

Hi I'm Trying to write code only with simple sed commands. Here is my question:

I want to print certain lines from a file, these are the conditions:

-Only if the line has at least 2 words

-Only if the second Word has at least 3 characters

-After the above conditions are followed the lines must be printed regularly except for the second word in the line which its first 3 characters must be doubled. EDIT:

This Part Can Be Done In AWK

-In the last Line it must print the amount of lines from the original file which weren't included

Example:

abc2 1 def2 3 abc2
dea 123 123 zy45
12 12
abc cd abc cd
xyz%$@! x@yz%$@! kk
xyzxyz
abc htzw 

Output:

dea 112233 123 zy45
xyz%$@! xx@@yyz%$@! kk
abc hhttzzw
4

This is My current code:

sed -r '/[ ]*([^ ]+[ ]){2,}/!d' ex >| tmp
sed -r '/[ ]*[^ ]+[ ][^ ]{3,}/!d' tmp >| tmp2
sed -r 's/([ ]*[^ }+[ ])([^ ])([^ ])([^ ])(*)/\1 \2 \2 \3 \3 \4 \4 \5/' tmp2 
>| tmp

But I seem to be getting an error which I can't fix and I can't figure out how I would print the number 4 ( See example ).

The error:

sed: -e expression #1, char 62: Invalid preceding regular expression

Any Help would be great :)

Upvotes: 2

Views: 339

Answers (3)

Micael Illos
Micael Illos

Reputation: 492

sed -r '/^[ ]*[^ ]+[ ][^ ]+([ ]|$)/!d' ex >| tmp
sed -r '/^[ ]*[^ ]+[ ][^ ]+([ ]|$)/d' ex >| delete
sed -r '/^[ ]*[^ ]+[ ][^ ]{3,}/d' tmp >> delete
sed -r '/^[ ]*[^ ]+[ ][^ ]{3,}/!d' tmp >| yolo
sed -r 's/(^[ ]*[^ ]+[ ])([^ ])([^ ])([^ ])(.*)/\1\2\2\3\3\4\4\5/' yolo
sed -n '$=' delete

Upvotes: 0

MiniMax
MiniMax

Reputation: 1093

The first part of your task can be done with GNU sed:

sed -rn 's/^([^ ]+ )([^ ])([^ ])([^ ])/\1\2\2\3\3\4\4/; T; p' input.txt

T label - If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension.

Output

dea 112233 123 zy45
xyz%$@! xx@@yyz%$@! kk
abc hhttzzw

Version with the number of not included lines

#!/bin/bash

sed -rn '
    s/^([^ ]+ )([^ ])([^ ])([^ ])/\1\2\2\3\3\4\4/
    T branch
    p; d
    :branch
    w not_included.txt
' input.txt

wc -l < not_included.txt

Output

dea 112233 123 zy45
xyz%$@! xx@@yyz%$@! kk
abc hhttzzw
4

Upvotes: 1

ctac_
ctac_

Reputation: 2491

You can try this (sed + bash)

nb=$(sed -n '$=' infile)
sed -E '
  /([^ ]* )([^[:space:]]*)(.*)/h
  s//\2/
  tA
  d
  :A
  s/([^[:space:]])([^[:space:]])([^[:space:]])(.*)/\1\1\2\2\3\3\4/
  tB
  d
  :B
  G
  s/(.*)\n([^ ]* )([^[:space:]]*)(.*)/\2\1\4/
' infile > infilebis
cat infilebis
echo $(($nb - $(sed -n '$=' infilebis)))
rm infilebis

Upvotes: 1

Related Questions