JayGatsby
JayGatsby

Reputation: 1621

Removing chars after second space

Assume that I have this text:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg] : 20.4453125 eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+Hm[A1sg] : 21.7978515625

I want to remove everything after the second space. Output should be:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Upvotes: 1

Views: 5361

Answers (3)

Giuseppe Ricupero
Giuseppe Ricupero

Reputation: 6272

You can also use a simple cut to do the job:

~$ echo 'eskitirim ... ' | cut -d' ' -f-2        # or -f1,2
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

~$ echo 'eskitirim ... ' | cut -d':' -f1
# eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Upvotes: 2

Josh Crozier
Josh Crozier

Reputation: 241178

You could use a capturing group to capture everything before the second space:

(.*?\s.*?)\s.*

And then replace everything with the first capturing group match.

Example Here

So (.*?\s.*?)\s.* replaced with \1 would output:

eskitirim eski[Verb]-t[Verb+Caus]+[Pos]+Hr[Aor]+YHm[A1sg]

Alternatively, you could also replace . with \S:

(\S*\s\S*)\s.*

Same output.

Upvotes: 2

Jeff Y
Jeff Y

Reputation: 2466

If you are absolutely certain that the format (as to spacing) will always be exactly as you've shown it in the question, a simpler solution might be appropriate, but I would dig deeper into the semantics of your data to give a more robust solution.

1) If spacing could possibly vary but you definitely want only the first two non-space-containing sequences, use awk '{print $1,$2}'.

2) If the : is significant and guaranteed to be present, I would use that rather than spaces to delimit what you are after: awk -F: '{print $1}'.

3) I would not recommend any sed/regex solution unless there can be more than one sequential space and it is critical to preserve the exact amount of such space.

Upvotes: 4

Related Questions