Reputation: 1621
Assume that I have this text:
eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg] : 20.4453125 eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+Hm[A1sg] : 21.7978515625
I want to remove everything after the second space. Output should be:
eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]
Upvotes: 1
Views: 5361
Reputation: 6272
You can also use a simple cut
to do the job:
~$ echo 'eskitirim ... ' | cut -d' ' -f-2 # or -f1,2
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]
~$ echo 'eskitirim ... ' | cut -d':' -f1
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]
Upvotes: 2
Reputation: 241178
You could use a capturing group to capture everything before the second space:
(.*?\s.*?)\s.*
And then replace everything with the first capturing group match.
So (.*?\s.*?)\s.*
replaced with \1
would output:
eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]
Alternatively, you could also replace .
with \S
:
(\S*\s\S*)\s.*
Upvotes: 2
Reputation: 2466
If you are absolutely certain that the format (as to spacing) will always be exactly as you've shown it in the question, a simpler solution might be appropriate, but I would dig deeper into the semantics of your data to give a more robust solution.
1) If spacing could possibly vary but you definitely want only the first two non-space-containing sequences, use awk '{print $1,$2}'
.
2) If the :
is significant and guaranteed to be present, I would use that rather than spaces to delimit what you are after: awk -F: '{print $1}'
.
3) I would not recommend any sed
/regex solution unless there can be more than one sequential space and it is critical to preserve the exact amount of such space.
Upvotes: 4