Reputation: 420
Input df(example)
Country SubregionA SubregionB
BRA State of Acre Brasiléia
BRA State of Acre Cruzeiro do Sul
USA AL Bibb County
USA AL Blount County
USA AL Bullock County
Output df
Country SubregionA SubregionB
BRA State of Acre State of Acre - Brasiléia
BRA State of Acre State of Acre - Cruzeiro do Sul
USA AL AL Bibb County
USA AL AL Blount County
USA AL AL Bullock County
The code snippet is quite self explanatory, but when executed seems to run forever. What could be going wrong(Also the dataframe 'data
' is quite large around 250K+ rows)
for row in data.itertuples():
region = data['Country']
if region == 'ARG' :
data['SubregionB'] = data[['SubregionA' 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'BRA' :
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
elif region == 'USA':
data['SubregionB'] = data[['SubregionA', 'SubregionB']].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
else:
pass
Explanation : Trying to join columns SubregionA and SubregionB based on values in the column name 'Country'. The separators are different and thus have written multiple if-else statements. Takes too long to execute, how can I make this faster?
Upvotes: 0
Views: 37
Reputation: 862701
You can use numpy.select
with Series.isin
and join columns with +
:
print (df)
Country SubregionA SubregionB
0 BRA State of Acre Brasilia
1 BRA State of Acre Cruzeiro do Sul
2 USA AL Bibb County
3 USA AL Blount County
4 USA AL Bullock County
5 JAP AAA BBBB
reg1 = ['ARG','BRA']
reg2 = ['USA']
a = np.select([df['Country'].isin(reg1), df['Country'].isin(reg2)],
[df['SubregionA'] + ' - ' + df['SubregionB'],
df['SubregionA'] + ' ' + df['SubregionB']],
default=df['SubregionB'])
df['SubregionB'] = a
print (df)
Country SubregionA SubregionB
0 BRA State of Acre State of Acre - Brasilia
1 BRA State of Acre State of Acre - Cruzeiro do Sul
2 USA AL AL Bibb County
3 USA AL AL Blount County
4 USA AL AL Bullock County
5 JAP AAA BBBB
Upvotes: 1