Or Shachar
Or Shachar

Reputation: 59

Cutting a string using multiple delimiters using the awk or sed commands

I am using a SIPP server simulator to verify incoming calls. What I need to verify is the caller ID and the dialed digits. I've logged this information to a file, which now contains, for example, the following:

From: <sip:972526134661@server>;tag=60=.To: <sip:972526134662@server>}

in each line.

What I want is to modify it to a csv file containing simply the two phone numbers, such as follows:

972526134661,972526134662

and etc.

I've tried using the awk -F command, but then I can only use the sip: as a delimiter or the @ or / as delimiters.

While, basically what I want to do is to take all the strings which begin with a < and end with >, and then take all the strings that follow the sip: delimiter.

using the cut command is also not an option, as I understand that it cannot use strings as delimiters.

I guess it should be really simple but I haven't find quite the right thing to use.. Would appreciate the help, thanks!

Upvotes: 1

Views: 3344

Answers (3)

Bill Woodger
Bill Woodger

Reputation: 13076

OK, for fun, picking some random data (from your original post) and using awk -F as you originally wanted.

To note, because your file is "generated", we can assume a regular format for the data and not expect the "short" patterns to cause mis-hits.

[g]awk -F'sip:|@' -v OFS="," '{print $2,$4}' yourlogfile

It uses both sip: and @ as the Field Separator, by means of the alternation operator |. It can easily be extended to allow further characters or strings to also be used to separate fields in the input if required. The built-in variable FS can contain a regular expression/regexp like this.

For that first sample in your question, it yields this:

972526134661,972526134662

For the latest (revision 8) version, and guessing what you want:

[g]awk -F'sip:|@|to_number:' -v OFS="," '{print $2,$5}' yourlogfile

Yields this:

from_number,972526134662

The [g]awk is because I used gawk on my machine, and got same behaviour with awk.

Slight amendment in style, suggested by @fedorqui, to use the command-line option -v to set the value for the Output Field Separator (an AWK built-in variable which can be amended using -v like any other variable) and separating the print fields with a comma, so that they are treated in the output as fields, rather than building a string with a hard-coded "," and treating it as one field.

Upvotes: 2

s-ol
s-ol

Reputation: 1754

You can use a regex replace, as long as the format stays the same (order is always From/To):

sed -E "s/^.*sip:([0-9]+)@.*sip:([0-9]+)@.*$/\1,\2/"

It's not a very specific or perfect solution, but in most cases an approach like this is enough.

Upvotes: 0

Tom Fenech
Tom Fenech

Reputation: 74595

I would suggest using sed to extract the two numbers:

$ sed -n 's/^From: <sip:\([0-9]*\).*To: <sip:\([0-9]*\).*/\1,\2/p' file
972526134661,972526134662

The regular expression matches a line beginning with From and captures the two numbers after <sip:. If the spaces are variable, you may want to add * to those places.

Upvotes: 1

Related Questions