Khaine775
Khaine775

Reputation: 2765

Suboptimal for loop on large-ish dataset

So I have a DataFrame with several thousand rows containing artificial forex trading data. The first ten rows look like this:

enter image description here

I want to iterate over this set, and for each row, calculate the CommonCurrency which in this case would be USD. So for each row, I go over the CurrencyPair, DeskRate and OrderQty columns and calculate a CommonCurrency:

for i in range(len(order_data)):
    if (order_data['CurrencyPair'][i] == 'GBP/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i] 
    elif (order_data['CurrencyPair'][i] == 'AUD/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/USD'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] * 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'USD/CHF'):
        order_data['CommonCurrency'][i] = order_data['DeskRate'][i] / 
        order_data['OrderQty'][i]
    elif (order_data['CurrencyPair'][i] == 'EUR/GBP'):
        order_data['CommonCurrency'][i] = #different calculation

This does not seem like the right way of doing it, especially not if there's a large number of different currency pairs. Another problem I come across is when I get to EUR/GBP, because now I have to get both the DeskRate from GBP/USD and EUR/USD, which I can't see how I can do with this method.

Any hints?

Upvotes: 0

Views: 36

Answers (1)

Kevin K.
Kevin K.

Reputation: 1397

One interesting feature in pandas is the concept of indexing. There are even more pythonic ways of doing this, but using loc, you can assign values to a section of the dataframe using series (columns):

order_data.loc[order_data['CurrencyPair'].isin(('GBP/USD', 'AUD/USD', 'EUR/USD')), 'CurrencyPair'] = order_data['DeskRate'] * order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'USD/CHF', 'CurrencyPair'] = order_data['DeskRate'] / order_data['OrderQty']
order_data.loc[order_data['CurrencyPair'] == 'EUR/GBP', 'CurrencyPair'] = some_func(order_data['DeskRate'], order_data['OrderQty'])

Thus avoiding any for loops

Upvotes: 2

Related Questions