Reputation: 5184
I have a df
that looks like this
Identifier Ticker ISIN Relationship Country .......
0 5.037663e+09 BTGGg.F D10080162 Supplier Germany .......
1 4.295870e+09 IVXG.DE NaN Supplier Germany .......
2 5.043321e+09 SAPG.DE D66992104 Customer Germany
3 4.295869e+09 BMWG.DE D12096109 Customer Germany
4 4.295870e+09 DTEGn.DE D2035M136 Customer Germany
5 4.295870e+09 IFXGn.DE D35415104 Supplier Germany
6 4.295869e+09 NSUG.DE D04312100 Customer Germany
7 5.000074e+09 EVKn.DE D2R90Y117 Customer Germany
8 4.295869e+09 LHAG.DE D1908N106 Customer Germany
9 4.295869e+09 MTXGn.DE D5565H104 Supplier Germany
10 4.295869e+09 SIEGn.DE D69671218 Supplier Germany
11 4.295870e+09 TKAG.DE D8398Q119 Supplier Germany
12 5.059963e+09 BNRGn.DE D12459117 Customer Germany
13 4.295869e+09 RHMG.DE D65111102 Supplier Germany
14 5.001195e+09 GBFG.DE D11648108 Supplier Germany
15 4.295869e+09 NXUG.DE D5650J106 Customer Germany
16 4.295870e+09 DPWGn.DE D19225107 Supplier Germany
17 4.295870e+09 ILM1k.DE D22430116 Supplier Germany
18 4.295869e+09 ADSGn.DE D0066B185 Customer Germany
19. 5.125125e+12 DBS.SG D12300523. Supplier SG
........................................................................
In the rows where the df['Country'] == 'Germany',
I have 2 functions I want to perform.
function 1: I want to isolate the rows where there is a smaller case letter, any smaller case letter before the ".", I want to drop it, so BTGGg.F would become BTGG.F, 1LM1k.DE would become 1LM1.DE but NXUG.DE would be unaffected.
Using that new dataframe after the first function.
function 2: Then for the rows where there is an upper case G before the ".", I want to drop the G so RHMG.DE would become RHM.DE but 1LM1.DE would be unaffected.
If I wanted to drop just the letter before the ".", the function would be simple like x = x.replace(x[x.find(".")-1],"")
.
But I don't know how to do this in a dataframe, and I don't know how to apply the conditions I mentioned. Can it be done and if so, how?
I was thinking it might look something like this but this obviously didn't work I already tried it.
df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == '([a-z])')
followed by
df.loc[df['Country'].eq('Germany'),'Ticker'] = df.loc[df['Country'].eq('Germany'),'Ticker'].str.replace((df['Ticker'][df['Ticker'].find(".")-1],"") if df['Ticker'][df['Ticker'].find(".")-1] == 'G')
Here is what the output should look like after the first round, from Country == Germany, remove the first lower case letter preceding "." :
Identifier Ticker ISIN Relationship Country .......
0 5.037663e+09 BTGG.F D10080162 Supplier Germany .......
1 4.295870e+09 IVXG.DE NaN Supplier Germany .......
2 5.043321e+09 SAPG.DE D66992104 Customer Germany
3 4.295869e+09 BMWG.DE D12096109 Customer Germany
4 4.295870e+09 DTEG.DE D2035M136 Customer Germany
5 4.295870e+09 IFXG.DE D35415104 Supplier Germany
6 4.295869e+09 NSUG.DE D04312100 Customer Germany
7 5.000074e+09 EVK.DE D2R90Y117 Customer Germany
8 4.295869e+09 LHAG.DE D1908N106 Customer Germany
9 4.295869e+09 MTXG.DE D5565H104 Supplier Germany
10 4.295869e+09 SIEG.DE D69671218 Supplier Germany
11 4.295870e+09 TKAG.DE D8398Q119 Supplier Germany
12 5.059963e+09 BNRG.DE D12459117 Customer Germany
13 4.295869e+09 RHMG.DE D65111102 Supplier Germany
14 5.001195e+09 GBFG.DE D11648108 Supplier Germany
15 4.295869e+09 NXUG.DE D5650J106 Customer Germany
16 4.295870e+09 DPWG.DE D19225107 Supplier Germany
17 4.295870e+09 ILM1.DE D22430116 Supplier Germany
18 4.295869e+09 ADSG.DE D0066B185 Customer Germany
19. 5.125125e+12 DBS.SG D12300523 Supplier SG
........................................................................
this is after the second round, from Country == Germany, remove the first upper case "G" preceding the ".":
Identifier Ticker ISIN Relationship Country .......
0 5.037663e+09 BTG.F D10080162 Supplier Germany .......
1 4.295870e+09 IVX.DE NaN Supplier Germany .......
2 5.043321e+09 SAP.DE D66992104 Customer Germany
3 4.295869e+09 BMW.DE D12096109 Customer Germany
4 4.295870e+09 DTE.DE D2035M136 Customer Germany
5 4.295870e+09 IFX.DE D35415104 Supplier Germany
6 4.295869e+09 NSU.DE D04312100 Customer Germany
7 5.000074e+09 EVK.DE D2R90Y117 Customer Germany
8 4.295869e+09 LHA.DE D1908N106 Customer Germany
9 4.295869e+09 MTX.DE D5565H104 Supplier Germany
10 4.295869e+09 SIE.DE D69671218 Supplier Germany
11 4.295870e+09 TKA.DE D8398Q119 Supplier Germany
12 5.059963e+09 BNR.DE D12459117 Customer Germany
13 4.295869e+09 RHM.DE D65111102 Supplier Germany
14 5.001195e+09 GBF.DE D11648108 Supplier Germany
15 4.295869e+09 NXU.DE D5650J106 Customer Germany
16 4.295870e+09 DPW.DE D19225107 Supplier Germany
17 4.295870e+09 ILM1.DE D22430116 Supplier Germany
18 4.295869e+09 ADS.DE D0066B185 Customer Germany
19. 5.125125e+12 DBS.SG D12300523 Supplier SG
........................................................................
Upvotes: 0
Views: 69
Reputation: 81
For dealing with the functions you mentioned, you can use some regex to perform them both
re.sub(r'G?[a-z]?(?=\.)', '', 'HERE_YOUR_COUNTRY_THINGY')
You didn't given more rules
Upvotes: 0
Reputation: 195508
df.loc[df.Country == 'Germany', 'Ticker'] = df[df.Country == 'Germany']['Ticker'].str.replace(r'[a-z](?=\.)', '').str.replace(r'G(?=\.)', '')
print(df)
Prints:
Identifier Ticker ISIN Relationship Country
0 5.037663e+09 BTG.F D10080162 Supplier Germany
1 4.295870e+09 IVX.DE NaN Supplier Germany
2 5.043321e+09 SAP.DE D66992104 Customer Germany
3 4.295869e+09 BMW.DE D12096109 Customer Germany
4 4.295870e+09 DTE.DE D2035M136 Customer Germany
5 4.295870e+09 IFX.DE D35415104 Supplier Germany
6 4.295869e+09 NSU.DE D04312100 Customer Germany
7 5.000074e+09 EVK.DE D2R90Y117 Customer Germany
8 4.295869e+09 LHA.DE D1908N106 Customer Germany
9 4.295869e+09 MTX.DE D5565H104 Supplier Germany
10 4.295869e+09 SIE.DE D69671218 Supplier Germany
11 4.295870e+09 TKA.DE D8398Q119 Supplier Germany
12 5.059963e+09 BNR.DE D12459117 Customer Germany
13 4.295869e+09 RHM.DE D65111102 Supplier Germany
14 5.001195e+09 GBF.DE D11648108 Supplier Germany
15 4.295869e+09 NXU.DE D5650J106 Customer Germany
16 4.295870e+09 DPW.DE D19225107 Supplier Germany
17 4.295870e+09 ILM1.DE D22430116 Supplier Germany
18 4.295869e+09 ADS.DE D0066B185 Customer Germany
19 5.125125e+12 DBS.SG D12300523. Supplier SG
Upvotes: 1