Reputation: 209
I have found a few answers to similar questions where the author wants to remove text after a certain character in a string (example). I would like to do a similar thing, however, the character I wish to use is "."
, specifically when there are three occurrences e.g. "..."
. Whenever I use the commands I have found, all of the characters are removed.
Example input file: InFile.txt:
GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_11158-16380_64... *
GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_73046-78268_63... at 100.00%
GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_338336-343558_32... at 100.00%
GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_482256-487478_64... at 100.00%
GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_429502-434724_64... at 100.00%
Example command 1: awk -F'...' '{print $1}' InFile.txt
Output of Example command 1 is just blank space.
I have tried putting the characters in "" e.g.
Example command 2: awk -F'"..."' '{print $1}' InFile.txt
Which produces this output:
GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_11158-16380_64... *
GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_73046-78268_63... at 100.00%
GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_338336-343558_32... at 100.00%
GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_482256-487478_64... at 100.00%
GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_429502-434724_64... at 100.00%
Ideally, I'd like the output to look like this:
GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_11158-16380_64
GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_73046-78268_63
GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_338336-343558_32
GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_482256-487478_64
GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_429502-434724_64
How do I remove text after "..." without replacing all of the text?
Thanks, Jamie
Upvotes: 3
Views: 925
Reputation: 784998
Answer from @RavinderSingh13 is great as it shows how to properly handle regex meta characters in FS
.
Here is an alternate awk
that doesn't use any regex and hence doesn't need any escaping:
awk '{print substr($0, 1, index($0, "...") - 1)}' file
GCA_000260195.2_FO_II5_V1_genomic.fna_Candidate_Sequence_11158-16380_64
GCA_000350365.1_Foc4_1.0_B2_genomic.fna_Candidate_Sequence_73046-78268_63
GCA_001696625.1_C1HIR_9889_genomic.fna_Candidate_Sequence_338336-343558_32
GCA_007994515.1_UK0001_genomic.fna_Candidate_Sequence_482256-487478_64
GWHAASU00000000_FocTR4_58.genomic.fna_Candidate_Sequence_429502-434724_64
Upvotes: 3
Reputation: 133458
Could you please try following, you need to escape it to tell awk
to treat .
as a literal character.
awk -F'\\.\\.\\.' '{print $1}' Input_file
OR as per Sundeep sir's comments use:
awk -F'\\.{3}' '{print $1}' Input_file
Correcting OP's attempt: Also you need NOT to use field separator as -F'"..."'
we need not to use "
here, instead use only -F'your_delimiter'
.
Bonus solution: In case one doesn't want to use field separator then use sub
here.
awk '{sub(/\.\.\..*/,"")} 1' Input_file
Upvotes: 4