Youri
Youri

Reputation: 417

Find results with grep and write to file

I would like to get all the results with grep or egrep from a file on my computer.

Just discovered that the regex of finding the string

'+33. ... ... ..' is by the following regex

\+33.[0-9].[0-9].[0-9].[0-9].' Or is this not correct?

My grep command is:

grep '\+31.[0-9].[0.9].[0.9].[0-9]' Samsung\ GT-i9400\ Galaxy\ S\ II.xry  >> resultaten.txt

The output file is only giving me as following:

"Binary file Samsung GT-i9400 .xry matches"

..... and no results were given.

Can someone help me please with getting the results and writing to a file?

Upvotes: 0

Views: 934

Answers (2)

tripleee
tripleee

Reputation: 189427

Firstly, the default behavior of grep is to print the line containing a match. Because binary files do not contain lines, it only prints a message when it finds a match in a binary file. However, this can be overridden with the -a flag.

But then, you end up with the problem that the "lines" it prints are not useful. You probably want to add the -o option to only print the substrings which actually matched.

Finally, your regex isn't correct at all. The lone dot . is a metacharacter which matches any character, including a control character or other non-text character. Given the length of your regex, you are unlikely to catch false positives, but you might want to explain what you want the dot to match. I have replaced it with [ ._-] which matches a space and some punctuation characters which are common in phone numbers. Maybe extend or change it, depending on what interpunction you expect in your phone numbers.

In regular grep, a plus simply matches itself. With grep -E the syntax would change, and you would need to backslash the plus; but in the absence of this option, the backslash is superfluous (and actually wrong in this context in some dialects, including GNU grep, where a backslashed plus selects the extended meaning, which is of course a syntax error at beginning of string, where there is no preceding expression to repeat one or more times; but GNU grep will just silently ignore it, rather than report an error).

On the other hand, your number groups are also wrong. [0-9] matches a single digit, where apparently the intention is to match multiple digits. For convenience, I will use the grep -E extension which enables + to match one or more repetitions of the previous character. Then we also get access to ? to mark the punctuation expressions as optional.

Wrapping up, try this:

grep -Eao '\+33[0-9]+([^ ._-]?[0-9]+){3}' \
   'Samsung GT-i9400 Galaxy S II.xry' >resultaten.txt

In human terms, this requires a literal +33 followed by required additional digits, then followed by three number groups of one or more digits, each optionally preceded by punctuation.

This will overwrite resultaten.txt which is usually what you want; the append operation you had also makes sense in many scenarios, so change it back if that's actually what you want.

If each dot in your template +33. ... ... .. represents a required number, and the spaces represent required punctuation, the following is closer to what you attempted to specify:

\+33[0-9]([^ ._-][0-9]{3}){2}[^ ._-][0-9]{2}

That is, there is one required digit after 33, then two groups of exactly three digits and one of two, each group preceded by one non-optional spacing or punctuation character.

(Your exposition has +33 while your actual example has +31. Use whichever is correct, or perhaps allow any sequence of numbers for the country code, too.)

Upvotes: 1

Paul Evans
Paul Evans

Reputation: 27577

It means that you're find a match but the file you're greping isn't a text file, it's a binary containing non-printable bytes. If you really want to grep that file, try:

strings Samsung\ GT-i9400\ Galaxy\ S\ II.xry | grep '+31.[0-9].[0.9].[0.9].[0-9]' >> resultaten.txt

Upvotes: 1

Related Questions