Ethan Powers
Ethan Powers

Reputation: 70

Are numpy string arrays faster than python strings

I am creating a string that is about 30 million words long. As you can imagine, this takes absolutely forever to create with a for-loop increasing by about 100 words at a time. Is there a way to represent the string in a more memory-friendly way, like a numpy array? I have very little experience with numpy.

bigStr = ''
for tweet in df['text']:
  bigStr = bigStr + ' ' + tweet
len(bigStr)

Upvotes: 0

Views: 790

Answers (2)

chepner
chepner

Reputation: 531135

If you want to build a string, use ' '.join, which will create the final string in O(n) time, rather than building it up one piece at a time, which takes O(n^2) time.

bigStr = ' '.join([tweet for tweet in df['text']])

Upvotes: 1

niaei
niaei

Reputation: 2399

I can see you're trying to get the length of all data. For that you don't need to append all strings. (And I see you add a white space for each element)

Just get the length of tweet and add it to an integer variable (+1 for each white space):

number_of_texts = 0
for tweet in df['text']:
  number_of_texts += 1 + len(tweet)

print(number_of_texts)

Upvotes: 0

Related Questions