Reputation: 93
From a daily report, I use:
pd.read_csv(filepath, sep = '\t')
to open a dataframe looking like the below (in simplified format):
finalDf2 = pd.DataFrame(dict(
Portfolio = pd.Series(['Book1', 'Book1', 'Book2', 'Book3', 'Book1','Book1']),
Strike = pd.Series(['108','109.10', '111', '114', '108.3', '115.0']),
Notional = pd.Series(['0', '-0.02', '35', '. 3K', '-0.05K', '0' ]))
)
By running the below on various entries under the "Notional" column:
type(finalDf2.iloc[ , ]
I see the 0s are of type int already.
The nonzero values however are strings. I tried to convert strings to floats by using:
finalDf2['Notional'].astype(float)
but before doing so, how could I convert all cells containing "K" values? For instance,
. 3K should end up being float or int 30
-0. 05K should end up being float or int -50
Spacings are actually in the file and thus dataframe unfortunately.
Upvotes: 1
Views: 306
Reputation: 77027
First, replace spaces.
In [344]: s = finalDf2['Notional'].str.replace(' ', '0')
Then, extract numerical part, and 'K' part, replacing K with 1000.
In [345]: (s.str.extract(r'(-?[\d\.]+)', expand=False).astype(float) *
s.str.extract(r'([K]+)', expand=False).replace([np.nan, 'K'], [1, 1000]) )
Out[345]:
0 0.00
1 -0.02
2 35.00
3 30.00
4 -50.00
5 0.00
Name: Notional, dtype: float64
Upvotes: 0
Reputation: 7516
Here is a possible solution:
def notional_to_num(x):
if isinstance(x, (int, float)):
return x
elif isinstance(x, str):
return x if 'K' not in x else float(x.replace(" ", "0")[:-1])*1e3
else:
raise
finalDf2.loc[:, 'Notional'] = finalDf2['Notional'].apply(notional_to_num)
Which gives the following output:
Notional Portfolio Strike
0 0 Book1 108
1 -0.02 Book1 109.10
2 35 Book2 111
3 30 Book3 114
4 -50 Book1 108.3
5 0 Book1 115.0
Upvotes: 1