Geoffrey Saunders
Geoffrey Saunders

Reputation: 135

Print lines that are not numbers

Quite simply, I have a csv file, with one column that should only contain integers. However, not all of them are integers, and I want to check this file (over 5 gigabytes large) and to capture the line numbers and (preferably) the values that are not integers. I've tried a number of things, such as using masks, but to no avail.

For example, we have the following csv table:

ID
5342
76375
sdfg23
2342lslf
jfijfojwo
395-34425
abc-24523
afhfhue3224

I would want to know that lines 3, 4, 5, 6, 7, and 8 are not integers. Output would look like (as a dataframe/table equivalent):

+-------------+------+
| ID          | Row  |
+-------------+------+
| sdfg23      | 3    |
| 2342lslf    | 4    |
| jfijfojwo   | 5    |
| 395-34425   | 6    |
| abc-24523   | 7    |
| afhfhue3224 | 8    |
+-------------+------+

Or even just spilling the line numbers to standard out would be really helpful.

I've tried things like using sed for example: sed -n '/?![[:digit:]]=' csvfile.csv

Upvotes: 0

Views: 351

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

You may use grep to find all lines that are numeric and invert the result:

grep -vE '^[0-9]+(\.[0-9]+)?$' file

The ^[0-9]+(\.[0-9]+)?$ pattern (POSIX ERE syntax enabled with -E) matches lines that fully match 111 or 111.111111 like numbers and -v will invert the result.

See the online grep demo:

s="11.1111
5342
76375
sdfg23
2342lslf
jfijfojwo
395-34425
abc-24523
afhfhue3224"
grep -vE '^[0-9]+(\.[0-9]+)?$' <<< "$s"

Output:

sdfg23
2342lslf
jfijfojwo
395-34425
abc-24523
afhfhue3224

Upvotes: 1

Sundeep
Sundeep

Reputation: 23667

You can check if any line contains any non-digit character.

$ # -n option enables line number in output
$ grep -n '[^0-9]' ip.txt
1:ID
4:sdfg23
5:2342lslf
6:jfijfojwo
7:395-34425
8:abc-24523
9:afhfhue3224

If you need further processing, awk would suit. Below is just an example, you can modify as per your needs.

$ awk 'NR==1{print "ID Row"; next} /[^0-9]/{print $0, NR-1}' ip.txt
ID Row
sdfg23 3
2342lslf 4
jfijfojwo 5
395-34425 6
abc-24523 7
afhfhue3224 8

Upvotes: 3

Related Questions