aironman
aironman

Reputation: 869

Find negative numbers in file with grep

i have this script that reads a file, the file looks like this:

711324865,438918283,2
-333308476,886548365,2
1378685449,-911401007,2
-435117907,560922996,2
259073357,714183955,2
...

the script:

#!/bin/bash
while IFS=, read childId parentId parentLevel
do
       grep "\$parentId" parent_child_output_level2.csv
       resul=$?
       echo "child is $childId, parent is $parentId parentLevel is $parentLevel resul is $resul"
done < parent_child_output_level1.csv

but it is not working, resul is allways returning me 1, which is a false positive.

I know that because i can launch the next command, equivalent, i think:

[core@dub-vcd-vms165 generated-and-saved-to-hdfs]$ 
grep "\-911401007"parent_child_output_level2.csv   
-911401007,-157143722,3

Please help.

Upvotes: 1

Views: 10492

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174844

grep command to print only the negative numbers.

$ grep -oP '(^|,)\K-\d+' file.csv
-333308476
-911401007
-435117907
  • (^|,) matches the start of a line or comma.
  • \K discards the previously matched characters.
  • -\d+ Matches - plus the following one or more numbers.

Upvotes: 3

bgoldst
bgoldst

Reputation: 35324

Your title is inconsistent with your question. Your title asks for how to grep negative numbers, which Avinash Raj answered well, although I'd suggest you don't even need the (Perl-style) look-behind positive assertion (^|,)\K to match start-of-field, because if the file is well-formed, then -\d+ would match all numbers just as well. So you could just run (edit: realized that with a leading - you need -- to prevent grep from taking the pattern as an option):

grep -oP -- '-\d+' file.csv;

Your question includes a script whose intention seems to be to grep for any number (positive or negative) in the first field (childId) of one file (parent_child_output_level2.csv) that occurs in the second field (parentId) of another file (parent_child_output_level1.csv). To accomplish this, I wouldn't use grep, because you're trying to do an exact numerical equality test, which can even be done as an exact string equality test assuming your numbers are always consistently represented (e.g. no redundant leading zeroes). Repeatedly grepping through the entire file just to search for a number in one column is also wasteful of CPU.

Here's what I would do:

parentIdList=($(cut -d, -f2 parent_child_output_level1.csv));
childIdList=($(cut -d, -f1 parent_child_output_level2.csv));
for parentId in "${parentIdList[@]}"; do
    for childId in "${childIdList[@]}"; do
        if [[ "$childId" == "$parentId" ]]; then
            echo "$parentId";
        fi;
    done;
done;

With this approach, you precompute both the parent id list and the child id list just once, using cut to extract the appropriate field from each file. Then you can use the shell-builtin for loop, shell-builtin if conditional, and shell-builtin [[ test command to accomplish the check, and finally finish with a shell-builtin echo to print the matches. Everything is shell-builtin, after the initial command substitutions that run the cut external executable.

If you also want to filter these results on negative numbers, you could grep for ^- in the results of the above script, or grep for it in the results of each (or just the first) cut command, or add the following line just inside the outer for loop:

if [[ "${parentId:0:1}" != '-' ]]; then continue; fi;

Alternative approach:

if [[ "$parentId" != -* ]]; then continue; fi;

Either approach will skip non-negatives.

Upvotes: 2

Related Questions