user1448731
user1448731

Reputation: 55

Extract a particular string from a file and output to another file using grep, awk, sed

I have a file and it contain the ff strings

2013-09-08 21:00:54 SMTP connection from [78.110.75.245]:5387 (TCP/IP connection count = 20)
2013-09-08 21:00:54 SMTP connection from [188.175.142.13]:34332 (TCP/IP connection count = 20)
2013-09-08 21:45:41 SMTP connection from [58.137.11.145]:51984 (TCP/IP connection count = 20)
2013-09-08 21:49:26 SMTP connection from [109.93.248.151]:22273 (TCP/IP connection count = 20)
2013-09-08 21:49:27 SMTP connection from [37.131.64.203]:7906 (TCP/IP connection count = 20)

What I want to do is extract the IP address only and save it to a file.

I started with this

sed '^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$' file > ips

But I couldn't make it work.

Upvotes: 1

Views: 2429

Answers (3)

Chris Seymour
Chris Seymour

Reputation: 85785

In practice I would probably go with jasonwryan solution but to answer why your sed command doesn't work is because you are using extended regular expression and even perl compliant regular expressions. To use ERE with sed you need to explicitly turn it on using -r with GNU sed or -E with BSD variants. However sed doesn't support PCRE but you can drop the use of non-capturing groups as it doesn't really help here anyway.

As you are just pattern matching grep is probably better then sed:

$ grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' file
78.110.75.245
188.175.142.13
58.137.11.145
109.93.248.151
37.131.64.203  

Notice the anchors also need dropping, that is ^ and $ as the pattern you want to match does not start at the beginning of the string or end at the end. grep also doesn't support extend regular expression by default so -E is used and -o prints only the matching part of the line not the whole line.

The final problem is you have just given sed and regular expression and a file. sed is not grep and won't just print out lines that match (although of course it can, this just isn't how you do it). An approach would be to use the substitution command s and replace everything before the IP and everything after:

$ sed -r 's/.+[[]([^]]+).+/\1/' file
78.110.75.245
188.175.142.13
58.137.11.145
109.93.248.151
37.131.64.203

Regexplanation:

s    # sed substitute command 
/    # the delimiter marking the start of the regexp
.+   # one or more of any character
[    # start a character class
[    # character class contains a single opening square bracket 
]    # close character class (needed so single [ isn't treated as unclosed)
(    # start capture group
[    # start character class
^]+  # one or more character not an ]
]    # end character class
)    # end capture group 
.+   # one or more of any character
/    # the delimiter marking the end of the regexp and start of replacement
\1   # the first capture group
/    # the delimiter marking the end of the replacement 

Here is a comparison of different regular expression flavours.

Upvotes: 1

jasonwryan
jasonwryan

Reputation: 4554

Using awk:

awk -F'[][]' '{print $2}' log.file > addresses
78.110.75.245
188.175.142.13
58.137.11.145
109.93.248.151
37.131.64.203

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 157992

You could match the content between the brackets [] with sed:

sed 's/.*\[\(.*\)\].*/\1/' log.file

Upvotes: 0

Related Questions