Reputation: 1

To delete a lines containing only numbers and punctuation marks in a huge text file

I am having things of this kind in my 5 million word Kannada text file:

, .
, , , .
, .
2005 .
, , 878 .
, .
2008 .
- , 751 .
- .

I need to delete lines of this kind.

Upvotes: 0

Answers (1)

konsolebox

Reputation: 75488

Using sed:

sed -n  '/^[[:punct:][:digit:][:space:]]\+$/!p' file
sed     '/^[[:punct:][:digit:][:space:]]\+$/d' file
sed -nr '/^[[:punct:][:digit:][:space:]]+$/!p' file
sed -r  '/^[[:punct:][:digit:][:space:]]+$/d' file

Using awk:

awk '!/^[[:punct:][:digit:][:space:]]+$/' file

Another way is to just print lines with alpha chars on it:

awk '/[[:alpha:]]' file
awk '/[A-Za-z]/' file
sed -n '/[[:alpha:]]/p' file
sed '/[A-Za-z]/!d' file

Of course you may use sed with -i to do inline editing:

sed -i.bak ...

Forgot grep:

grep -v '^[[:punct:][:digit:][:space:]]\+$' file
grep '[[:alpha:]]' file

Upvotes: 1

To delete a lines containing only numbers and punctuation marks in a huge text file

Answers (1)

Related Questions