pythonRcpp
pythonRcpp

Reputation: 2146

Map to substring key in pandas

>>> print(df)
         out_0          out_1                b       c        e      d
0  out_GFD_994  out_GFD_Part2  NSE_FO_BHP_1703  -27022   361.45 -13200
1  out_GFD_994  out_GFD_Part2  NSE_FO_BHP_1704   30132   399.98  13200
2  out_GFD_994  out_GFD_Part2  NSE_FO_MQG_1703 -145173  1177.97 -11200
3  out_GFD_994  out_GFD_Part2  NSE_FO_MQG_1704  158747  1180.55  12300
4  out_GFD_994  out_GFD_Part2  NSE_FO_TLS_1703   23316  1247.50   3200
5  out_GFD_994  out_GFD_Part2  NSE_FO_TLS_1704  -19333  1253.62  -3200

Need to map each key in column b to value in file ( example NSE_FO_BHP_1703 matches to NSE_FO_BHP_1703_* in file) if key in file contains LONG as substring, then subtract value in column d if key in file contains SHORT as substring, then add value in column d

such that the NSE_FO_BHP_1703_MAXLONGPOS = 200000 changes to 213200 change NSE_FO_BHP_1703_MAXSHORTPOS to 186800. Change the MAXLONGEXPOSURE and MAXSHORTEXPOSURE to 263200 and 236800 respectively. This would be done for every key in b column of df.

NSE_FO_BHP_1703_MAXLONGPOS = 200000
NSE_FO_BHP_1703_MAXSHORTPOS = 200000
NSE_FO_BHP_1703_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1703_MAXSHORTEXPOSURE = 250000
NSE_FO_BHP_1704_MAXLONGPOS = 200000
NSE_FO_BHP_1704_MAXSHORTPOS = 200000
NSE_FO_BHP_1704_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1704_MAXSHORTEXPOSURE = 250000
NSE_FO_TLS_1703_MAXLONGPOS = 100000
NSE_FO_TLS_1703_MAXSHORTPOS = 100000
NSE_FO_TLS_1703_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1703_MAXSHORTEXPOSURE = 200000
NSE_FO_TLS_1704_MAXLONGPOS = 100000
NSE_FO_TLS_1704_MAXSHORTPOS = 100000
NSE_FO_TLS_1704_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1704_MAXSHORTEXPOSURE = 200000
NSE_FO_MQG_1703_MAXLONGPOS = 300000
NSE_FO_MQG_1703_MAXSHORTPOS = 300000
NSE_FO_MQG_1703_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1703_MAXSHORTEXPOSURE = 400000
NSE_FO_DEF_1704_MAXLONGPOS = 300000
NSE_FO_MQG_1704_MAXSHORTPOS = 300000
NSE_FO_MQG_1704_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1704_MAXSHORTEXPOSURE = 400000

Upvotes: 1

Views: 269

Answers (1)

jezrael
jezrael

Reputation: 863301

I think you can use if all data are in pair - *LONG* and *SHORT*:

from pandas.compat import StringIO

temp=u"""
NSE_FO_BHP_1703_MAXLONGPOS = 200000
NSE_FO_BHP_1703_MAXSHORTPOS = 200000
NSE_FO_BHP_1703_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1703_MAXSHORTEXPOSURE = 250000
NSE_FO_BHP_1704_MAXLONGPOS = 200000
NSE_FO_BHP_1704_MAXSHORTPOS = 200000
NSE_FO_BHP_1704_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1704_MAXSHORTEXPOSURE = 250000
NSE_FO_TLS_1703_MAXLONGPOS = 100000
NSE_FO_TLS_1703_MAXSHORTPOS = 100000
NSE_FO_TLS_1703_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1703_MAXSHORTEXPOSURE = 200000
NSE_FO_TLS_1704_MAXLONGPOS = 100000
NSE_FO_TLS_1704_MAXSHORTPOS = 100000
NSE_FO_TLS_1704_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1704_MAXSHORTEXPOSURE = 200000
NSE_FO_MQG_1703_MAXLONGPOS = 300000
NSE_FO_MQG_1703_MAXSHORTPOS = 300000
NSE_FO_MQG_1703_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1703_MAXSHORTEXPOSURE = 400000
NSE_FO_DEF_1704_MAXLONGPOS = 300000
NSE_FO_MQG_1704_MAXSHORTPOS = 300000
NSE_FO_MQG_1704_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1704_MAXSHORTEXPOSURE = 400000"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df2 = pd.read_csv(StringIO(temp), sep="\s+=\s+", engine='python', names=['A','B'])
#create dict for map
d = df1.set_index('b')['d'].to_dict()
#split by last _ and map
a = df2.A.str.rsplit('_', n=1).str[0].map(d)
#if LONG substract, else add
df2['C'] = np.where(df2.A.str.contains('LONG'), df2.B - a, df2.B + a)
print (df2)
                                   A       B         C
0         NSE_FO_BHP_1703_MAXLONGPOS  200000  213200.0
1        NSE_FO_BHP_1703_MAXSHORTPOS  200000  186800.0
2    NSE_FO_BHP_1703_MAXLONGEXPOSURE  250000  263200.0
3   NSE_FO_BHP_1703_MAXSHORTEXPOSURE  250000  236800.0
4         NSE_FO_BHP_1704_MAXLONGPOS  200000  186800.0
5        NSE_FO_BHP_1704_MAXSHORTPOS  200000  213200.0
6    NSE_FO_BHP_1704_MAXLONGEXPOSURE  250000  236800.0
7   NSE_FO_BHP_1704_MAXSHORTEXPOSURE  250000  263200.0
8         NSE_FO_TLS_1703_MAXLONGPOS  100000   96800.0
9        NSE_FO_TLS_1703_MAXSHORTPOS  100000  103200.0
10   NSE_FO_TLS_1703_MAXLONGEXPOSURE  200000  196800.0
11  NSE_FO_TLS_1703_MAXSHORTEXPOSURE  200000  203200.0
12        NSE_FO_TLS_1704_MAXLONGPOS  100000  103200.0
13       NSE_FO_TLS_1704_MAXSHORTPOS  100000   96800.0
14   NSE_FO_TLS_1704_MAXLONGEXPOSURE  200000  203200.0
15  NSE_FO_TLS_1704_MAXSHORTEXPOSURE  200000  196800.0
16        NSE_FO_MQG_1703_MAXLONGPOS  300000  311200.0
17       NSE_FO_MQG_1703_MAXSHORTPOS  300000  288800.0
18   NSE_FO_MQG_1703_MAXLONGEXPOSURE  400000  411200.0
19  NSE_FO_MQG_1703_MAXSHORTEXPOSURE  400000  388800.0
20        NSE_FO_DEF_1704_MAXLONGPOS  300000       NaN <-no data in df1 - map return NaN
21       NSE_FO_MQG_1704_MAXSHORTPOS  300000  312300.0
22   NSE_FO_MQG_1704_MAXLONGEXPOSURE  400000  387700.0
23  NSE_FO_MQG_1704_MAXSHORTEXPOSURE  400000  412300.0

Upvotes: 1

Related Questions