Reputation: 21
i have a list of value , i need to separate them with the date pattern dd mmm yyyy
input_list = ['01 Feb 2023 CNY 0.00 559,929.32 **01 Feb 2023 CNY 0.00 0.00**',
'01 Feb 2023 HKD 0.00 22,441,926.65 **01 Feb 2023 HKD 0.00 0.00**',
'01 Feb 2023 USD 0.00 585,686.14 **01 Feb 2023 USD 0.00 0.00**']
hoping to get result: df result :
currency 1stvalue 2ndvalue
CNY 0.00 0.00
HKD 0.00 0.00
USD 0.00 0.00
any direction or suggestion would help, thank you
i found similar method but still cant figure it out : Regex to splitstring on date and keep it
similar example:
import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)
Upvotes: 2
Views: 94
Reputation: 78
I hope this can solve your problem
import re
input_list = ['01 Feb 2023 CNY 0.00 559,929.32 **01 Feb 2023 CNY 0.00 0.00**',
'01 Feb 2023 HKD 0.00 22,441,926.65 **01 Feb 2023 HKD 0.00 0.00**',
'01 Feb 2023 USD 0.00 585,686.14 **01 Feb 2023 USD 0.00 0.00**']
currency_pattern = re.compile(r'(\w{3})\s+([0-9,.]+)')
output_dict = {}
for item in input_list:
matches = re.findall(currency_pattern, item)
matches.pop(0)
matches.pop(1)
if matches:
for match in matches:
currency = match[0]
value = match[1]
if currency not in output_dict:
output_dict[currency] = []
output_dict[currency].append(value)
new_df = pd.DataFrame.from_dict(output_dict, orient='index', columns=['value1', 'value2'])
new_df = new_df.rename_axis('currency').reset_index()
print(new_df)
Upvotes: 2
Reputation: 147
You can try:
import re
pattern = r"\d{2} [A-Z,a-z]{3} [0-9]{4} ([A-Z]{3}) (\d\.\d\d)"
string = "01 Feb 2023 CNY 0.00 559,929.32 **01 Feb 2023 CNY 0.00 0.00**"
responses = re.findall(pattern, string)
# printer
for i, response in enumerate(responses):
if i == 0:
print(response[0], end="")
print(" ", response[1], end="")
Upvotes: 2