Aashu
Aashu

Reputation: 1287

Remove all non integer row from a csv file with sed/grep

I have tried using cat /home/kgh/abc.CSV | awk -F, '$2 ~ /^[[:digit:]]+$/' . It is checking condition at column level.

cat /home/kgh/abc.CSV
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
1,010116085144,125,125,124,123, 000,00,232,28,294522,077\82,041755,0074,0013
1,010116085149,125,125,124,123,^@000,00,232,28,294522,088,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

expected o/p

1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013   
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

I am not able to check at row level as cat /home/kgh/abc.CSV | awk -F, '$0 ~ /^[[:digit:]]+$/' . I cannot put this condition for other columns because it contains different number of columns in each row.I want to check only numeric data in my csv file otherwise I want to delete that row from my csv file.Is there any efficient way to do this ? Thanks.

Edit

I am worry why the below mentioned solution is not working with sed and grep too

root@aa:/home/kgh/# cat abc.CSV
1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013
1,010116084156,121,117,117,116,-244,^@06,228,28,294557,077,044047,0074,0013
root@aa:/home/kgh# sed -n '/^[0-9, -]*$/p' abc.CSV
root@aa:/home/kgh# sed -rn '/^-?[0-9]+(, ?-?[0-9]+)*$/p' abc.CSV  
root@aa:/home/kgh# grep -v '[^0-9, -]' abc.CSV

What am I missing ?

Edited

file abc.CSV
abc.CSV: ASCII text, with CRLF line terminators

I opened a file to see CRLF line terminators in vi using :set list

1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013$
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013$
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013$
1,010116084156,121,117,117,116,-244,^@06,228,28,294557,077,044047,0074,0013$

Hence no result with sed or grep.

I have removed CRLF line terminators now its working with grep and sed.

sed 's/'"$(printf '\015')"'//g' abc.CSV | sed '/[^0-9, -]/d'
1,010116084135,118,115,113,115,-368,09,225,28,294579,077,044677,0074,0013
1,010116084146,000,000,000,000,000,09,227,28,294536,077,044271,0074,0013
1,010116084151,122,121,118,119,-099,05,228,28,294547,077,044150,0074,0013

Upvotes: 0

Views: 497

Answers (5)

David C. Rankin
David C. Rankin

Reputation: 84531

In addition to the sed print command, you can alternatively use the sed delete command to delete any lines that contain characters other than '0-9, -'. For example:

$sed '/[^0-9, -]/d' abc.csv
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

note: to 'edit-in-place', you can use the '-i' option. Additionally, you can use the '-i.bak' option to have sed create abc.csv.bak containing the original file with the modified file in abc.csv.

Upvotes: 0

Mohammad Mustaqeem
Mohammad Mustaqeem

Reputation: 1084

Another method using grep: grep '^[0-9, -]*$' abc.csv

Upvotes: 0

jyvet
jyvet

Reputation: 2191

With grep:

grep -v '[^0-9, -]' abc.CSV

Upvotes: 1

Benjamin W.
Benjamin W.

Reputation: 52112

This prints all lines that consist of numbers, separated by commas and an optional space; the numbers are prepended by an optional unary minus.

$ sed -rn '/^-?[0-9]+(, ?-?[0-9]+)*$/p' infile
1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

Without extended regex (-r), the parentheses would have to be escaped, ? becomes \{0,1\} and + becomes \{1,\}.

Upvotes: 0

Cyrus
Cyrus

Reputation: 88563

With GNU sed:

sed -n '/^[0-9, -]*$/p' abc.csv

If you want to edit your file "in place" add sed's option -i.

Output:

1,010116085134,125,125,124,123, 000,00,232,28,294522,077,041755,0074,0013
1,010116085139,125,125,124,123,-012,00,232,28,294522,077,041755,0074,0013
2,010116084424,15954,15593,14034
2,010116084616,15651,15366,12804

See: The Stack Overflow Regular Expressions FAQ

Upvotes: 1

Related Questions