Reputation: 31925
A csv file example.csv, it has
hello,world,wow
this,is,amazing
I want to get the first column elements, at the beginning I wrote a sed command like:
sed -n 's/\([^,]*\),*/\1/p' example.csv
output:
helloworld,now
thisis,amazing
Then I modified my command to the following and get what I want:
sed -n 's/\([^,]*\).*/\1/p' example.csv
output:
hello
this
command1 I used comma(,) and command2 I replaced comma with dot(.), and it works as expected, can anyone explain how sed really works to get the 1st output? What's the story behind? Is it because of the dot(.) or because of the substitution group & back-reference?
Upvotes: 1
Views: 73
Reputation: 106483
In both regexes, ([^,]*)
will consume the same part of the string - all the symbols preceding the first encountered comma. Apparently the difference is how are the remaining parts of those regexes treated.
In the first one, it's ,*
- zero or more comma symbols. Obviously all it might consume is
the comma itself - the rest of the line isn't covered by a pattern.
In the second one, it's .*
- zero or more of any symbols. It's not a big surprise that'll cover the remaining string completely - as it has nothing to stop at; any is, well, any. )
In both cases the pattern-covered part of the string is replaced by the contents of the capturing group (and that's, as I said already, 'all the symbols before the first comma') - and what's covered by the remaining part of the regex is just removed. So in first case the very first comma is erased, in the second - the comma and the rest of the string.
Upvotes: 3
Reputation: 2857
The reason behind that is that the pattern matches only to the first part of the word, i.e. only the Hello, part is replaced. The part ,* takes arbitrary amount of commas, and then nothing is set to be next, i.e. nothing else matches the pattern. For example:
hello,,,,,,,,,,,,,,,,,,world
would be replaced to
helloworld
A good example would be
sed -n 's/\([^,]*\),*$/\1/p' example.csv
This will work if and only if all the commas are at the end of the line and will trim them, e.g.
hello,,,,,,
Hope this makes the problem a bit clearer.
Upvotes: 1
Reputation: 41460
If you like first word, why not use awk
awk -F, '{print $1}' file
hello
this
Using sed
with back reference
sed -nr 's/([^,]*),.*/\1/p' file
hello
this
It seems that to make it work you need the .*
so it get the whole line.
The r
option make you not need to escape the parentheses \(
Upvotes: 0
Reputation: 6421
Can I suggest not using sed
?
cut -d, -f1 example.csv
Personally, I'm a huge sed
fan, but cut
is much more appropriate in this instance.
Upvotes: 0