Reputation: 937
I have a large dataset that is a CSV file and I've already cleaned it up quite a bit. However there are instances where I have a column which is not numeric and I want to remove those rows.
So far I thought something like this would work. I am thinking the solution is probably simple. I am not sure whether I can even do something like this. It is a CSV file so I tried to tell awk that its delimited by commas. Also an example of a good value in column one is this: 323870133825187840
awk '/,/$1 != numeric'
Are there any suggestions on this one. I initially thought about doing some kind of a one liner such as, [0-9], etc. I would greatly appreciate it if anyone could help me out with this. For someone out there this is probably like child's play :)
Upvotes: 3
Views: 3450
Reputation: 753525
Assuming that you're only dealing with unsigned integer values, you can use either:
awk -F, '$1 ~ /^[[:digit:]]+$/'
or
awk -F, '$1 !~ /[^[:digit:]]/'
Both use the 'implicit print' action. The first checks that $1
only contains digits; the second checks that $1
does not contain a non-digit.
If your numbers are more general (signed, possibly with decimal point, possibly with exponential notation — 6.0221413e+23 (Avogadro's number), for example), then you need a more complex regular expression and will use positive matching, only selecting rows where field 1 matches your regex for a valid number.
Upvotes: 5
Reputation: 784958
You can simply grep it:
grep -P '^[+-]?\d*(\.\d+)?(?<=.),' file
This will return lines if it finds decimal/integer number in first column with optional =/-
sign.
For ex following numbers will be matched in column 1:
6.72
1235.3
72
.66
2.8
+3.5
-5.9
PS: Empty column value in 1st column will not be matched.
Upvotes: 1