Reputation: 1273
There are an abundance of questions on here about String.split() and regex, but none of them seem to pertain to my dilemma here ...
I have the following:
string a = "@USER_78b1ff36 just a hunch............ You
two seem to know your baseball, and may have been teammates before....";
splitTweet = tweets[i].split("\\.+|\\s+|\\*+|\\,+|\\!+|\"|\\-|/|\\:");
printArray(splitTweet); //prints line by line the index followed by value
OUTPUT:
0: @USER_78b1ff36
1: just
2: a
3: hunch
4:
5: You
6: two
7: seem
8: to
9: know
10: your
11: baseball
12:
13: and
14: may
15: have
16: been
17: teammates
18: before
I'm getting these spaces, but they only occur for single instances of punctuation, white space is broken as expected, and multiple instances of punctuation are broken as expected ...
What am I doing wrong with my expression?(I'm sure there are multiple things, this is the first time I've tried using split() ) I want to have only words, but I do need to include @ and # if they're attached to a token.
Upvotes: 1
Views: 167
Reputation: 281446
"baseball, and"
splits into
"baseball"
""
"and"
because ", "
is two delimiters. Your +
quantifiers only allow runs of a single kind of delimiter. If you want to split on runs of different delimiters, put +
around the whole thing rather than the parts:
a.split("(\\.|\\s|\\*|\\,|\\!|\"|\\-|/|\\:)+");
Upvotes: 5