Reputation: 77
I need to replace acronym slangs within a string to its expanded part. The dataset for the slang I use is this one with over 3k items. This is my current code for the process:
import pandas as pd
slangs = pd.read_csv('slang.csv', index_col=[0])
def expand_slang_acronyms():
word_list = 'foo brb bar'.split(' ')
for i in range(len(word_list)):
for j in range(len(slangs)):
if word_list[i] == slangs.loc[j, 'acronym']:
word_list[i] = slangs.loc[j, 'expansion']
print(' '.join(word_list)) # 'foo be right back bar'
Running it as is is quite fast but I need to replace thousands of strings. Timing the code executing just 100 times:
from timeit import timeit
timeit(expand_slang_acronyms, number=100)
In this instance it output 6.519681000005221
which is really slow considering it's only 100 times. I need a faster way to do this.
Upvotes: 1
Views: 63
Reputation: 3749
I think there are many ways to do this. Here is one way to speed up the process.
import pandas as pd
slangs = pd.read_csv('slang.csv')
slang_dict = dict(zip(slangs['acronym'], slangs['expansion']))
def expand_slang_acronyms():
word_list = 'foo brb bar'.split(' ')
for i in range(len(word_list)):
if word_list[i] in slang_dict:
word_list[i] = slang_dict[word_list[i]]
print(' '.join(word_list)) # 'foo be right back bar'
timeit(expand_slang_acronyms, number=100)
This should result in a performance boost, as dictionary lookups are O(1) on average, compared to O(n) for iterating through a DF.
Upvotes: 2