Are numpy string arrays faster than python strings

Question

I am creating a string that is about 30 million words long. As you can imagine, this takes absolutely forever to create with a for-loop increasing by about 100 words at a time. Is there a way to represent the string in a more memory-friendly way, like a numpy array? I have very little experience with numpy.

bigStr = ''
for tweet in df['text']:
  bigStr = bigStr + ' ' + tweet
len(bigStr)

chepner · Accepted Answer

If you want to build a string, use ' '.join, which will create the final string in O(n) time, rather than building it up one piece at a time, which takes O(n^2) time.

bigStr = ' '.join([tweet for tweet in df['text']])

Are numpy string arrays faster than python strings

Answers (2)

Related Questions