Reputation: 19
I have to write a single function that should return the first word in the following strings:
("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"
All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.
I've used the following options:
return text.split()[0]
return re.split(r'\w*, text)[0]
Both error at some of the strings, so who can help me???
Upvotes: 0
Views: 8289
Reputation: 1
I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:
stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
if stringVariable[i] != " ":
firstWord = firstWord + stringVariable[i]
else:
break
This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!
Upvotes: 0
Reputation:
You can try something like this:
import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
match=re.findall(pattern,words_tuple)
for i in match:
if i[0].isalnum():
return i
print(first_word(("don't touch it")))
output:
don't
Upvotes: 0
Reputation: 1856
Try the below code. I tested with all your inputs and it works fine.
import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
rgx = re.compile("(\w[\w']*\w|\w)")
out=rgx.findall(i)
print out[0]
Output:
Hello
a
don't
greetings
and
hi
Upvotes: 2
Reputation: 41168
A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:
from string import punctuation, whitespace
def first_word(s):
to_strip = punctuation + whitespace
return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)
tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]
for test in tests:
print('#{}#'.format(first_word(test)))
Outputs:
#Hello#
#a#
#don't#
#greetings#
#and#
#hi#
Upvotes: 1
Reputation: 1230
try this one:
>>> def pm(s):
... p = r"[a-zA-Z][\w']*"
... m = re.search(p,s)
... print m.group(0)
...
test result:
>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi
Upvotes: 1
Reputation: 59426
It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:
re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]
For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!"
, though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).
Upvotes: 1