Reputation: 29371
I have a set of 10 CSV files, which normally have a an entry of this kind
a,b,c,d
d,e,f,g
Now due to some error entries in this file have become of this kind
a,b,c,d
d,e,f,g
,,,
h,i,j,k
Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.
Any command that you recommend that can replaces the erroneous lines in all the files.
Upvotes: 1
Views: 1360
Reputation: 21
yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.
Upvotes: 0
Reputation: 425
Most simply:
$ grep -v ,,,, oldfile > newfile
$ mv newfile oldfile
Upvotes: 1
Reputation: 753845
It depends on what you mean by replace. If you mean 'remove', then a trivial variant on @wnoise's solution is:
grep -v '^,,,$' old-file.csv > new-file.csv
Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:
grep -v '^,*$' ...
There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.
Check out Mastering Regular Expressions.
Upvotes: 5
Reputation: 131600
Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed
. To delete:
sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...
To replace: well, see wnoise's answer, or if you don't want to create new files with the output,
sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...
or
sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...
(that should be entered exactly as is, including the line break). Of course, you can also do this with awk
or perl
or, if you're only deleting lines, even grep
:
egrep -v '^,+$' < oldfile.csv > newfile.csv
I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i
option from sed
, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv
from grep
.
EDIT: It was pointed out in a comment that some features of these sed
commands only work on GNU sed
. As far as I can tell, these are the -i
option (which can be replaced with shell redirection, sed ... <infile >outfile
) and the \+
modifier (which can be replaced with \{1,\}
).
Upvotes: 1
Reputation: 1126
What about trying to keep only lines which are matching the desired format instead of handling one exception ?
If the provided input is what you really want to match:
grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv
If the input is different, provide it, the regular expression should not be too hard to write.
Upvotes: 1
Reputation: 14743
Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use
awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
Upvotes: 1
Reputation: 9922
sed 's/,,,/replacement/' < old-file.csv > new-file.csv
optionally followed by mv new-file.csv old-file.csv
Upvotes: 2