Testdev01
Testdev01

Reputation: 13

Extracting certain words from a text file using tr and awk

this has been giving me a lot of trouble

URL: http://123.123.123.123
file: php
124.124.124.124|user1|email|phone

URL: http://1.2.3.4
file: php
2.1.3.1|userx|emailx|phonex

and the file contains more sets of data just like this one

i used

grep http -A 3|tr '\n' ' '|tr '|' ' '|awk '{print $2,$7,$8}'|tr ' ' ':'

the outcome is only from the first set of data

123.123.123.123:email:phone

intended outcome

123.123.123.123:email:phone
1.2.3.4:emailx:phonex

Upvotes: 1

Views: 155

Answers (6)

mrqiao001
mrqiao001

Reputation: 152

perl -00 -nE 'say join":",$1,$2,$3 if /\/\/(.*)\n.*\n.*\|(\w+)\|(\w+)/' file

123.123.123.123:email:phone
1.2.3.4:emailx:phonex

Upvotes: 0

Daweo
Daweo

Reputation: 36660

I would exploit getline function for this task as follows, let file.txt content be

URL: http://123.123.123.123
file: php
124.124.124.124|user1|email|phone

URL: http://1.2.3.4
file: php
2.1.3.1|userx|emailx|phonex

then

awk 'BEGIN{FS="|";OFS=":"}sub(/^URL: /,""){url=$0;getline;getline;print url,$3,$4}' file.txt

gives output

http://123.123.123.123:email:phone
http://1.2.3.4:emailx:phonex

Explanation: I inform GNU AWK that field separator (FS) is pipe (|) whilst output field separator (OFS) is colon (:), I use two effects of sub: alteration of line and return value, if alteration occurred I save current line (with leading URL: removed by sub) I do use getline twice to get line after next line, after that I print url, 3rd and 4th columns.

(tested in GNU Awk 5.0.1)

Upvotes: 0

Jetchisel
Jetchisel

Reputation: 7831

If ed is available/acceptable.

The script.ed

g/^$/d
g|^URL: http://|s|||\
+d
%s/^.*user[^|]*//
g/./;+j
%s/|/:/g
,p
Q

Run

ed -s file.txt < script.ed

Upvotes: 0

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10133

I'd do it like that:

awk -F\| '
    /^URL:/ { sub(/.*\/\//,""); url=$0; next   }
      NF==4 { printf "%s:%s:%s\n", url, $3, $4 }
' file

Upvotes: 1

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2875

gawk 'gsub("[|]", ":", $!(NF = NF))' RS= OFS= FS='.+//|\n[^|]*[|][^|]*' 
123.123.123.123:email:phone
1.2.3.4:emailx:phonex

Upvotes: 1

tripleee
tripleee

Reputation: 189739

If you are using Awk anyway, you can get rid of grep and tr.

If you can rely on the empty line to separate arguments, try RS='\n\n'. Here's a refactoring which instead extracts the information from the third line after the hit.

awk '/http/ { l=2; ip=$0; sub(/.*\/\//, "", ip); next }
l && --l == 0 { tail=$0; sub(/^[^|]*[|][^|]*[|]/, "", tail);
    sub(/[|]/, ":", tail); print ip ":" tail }'

Perhaps /^URL:/ would be a better regex than /http/ for finding the beginning of a record.

Upvotes: 1

Related Questions