Reputation: 55
How can I remove the substrings starting with #
and everything after #
?
There are many of them on different lines; they all start with #
and are at the end of the line, and the number at the end is always different. They are all 15 characters long; I want to delete everything from #
through the end of the line, with sed or awk.
http://www.somesite/play/episodes/xyz/fred-episode-110#group=p02q32xl
http://www.somesite/play/episodes/abc/simon-episode-266#group=p03d924k
http://www.somesite/play/episodes/qwe/mum-episode-39#group=p03l1jpr
http://www.somesite/play/episodes/zxc/dad-episode-41#group=p03l1j9s
http://www.somesite/play/episodes/asd/bob-episode-57#group=p03l1j7g
Upvotes: 0
Views: 69
Reputation: 52536
With cut
– declare #
as the field separator and print only the first field:
cut -d '#' -f 1 infile
With sed – replace everything from #
on with the empty string:
sed 's/#.*//' infile
With awk – declare #
as field separator and print the first field:
awk -F'#' '{ print $1 }' infile
With Bash, taking advantage of the fact that it's always the last 15 characters:
while IFS= read -r line; do
echo "${line:0:-15}"
done < infile
Notice that this is a) very slow and b) requires Bash 4.2-alpha or newer to support the negative length value in the parameter expansion.
With Perl – splitting by #
, taking the first field of the list and printing it with say
to include a newline:
perl -nE 'say ((split /#/)[0])' infile
or, more concise and sed-ish (pointed out my mklement0):
perl -pe 's/#.*//' infile
Upvotes: 3
Reputation: 440677
To complement Benjamin W.'s helpful answer:
grep
is another option:
If you do NOT want to include the #
:
grep -Eo '^[^#]+' file
If you DO want to include the #
:
grep -Eo '^[^#]+.' file
Upvotes: 1