zeuzzz_m
zeuzzz_m

Reputation: 21

python regex - how do i split a string on a date pattern - only keep values after second occurrence of the date pattern

i have a list of value , i need to separate them with the date pattern dd mmm yyyy

input_list = ['01 Feb 2023 CNY  0.00  559,929.32 **01 Feb 2023 CNY  0.00  0.00**',
 '01 Feb 2023 HKD  0.00  22,441,926.65 **01 Feb 2023 HKD  0.00  0.00**',
 '01 Feb 2023 USD  0.00  585,686.14 **01 Feb 2023 USD  0.00  0.00**']

hoping to get result: df result :

currency     1stvalue   2ndvalue
CNY           0.00        0.00
HKD           0.00        0.00
USD           0.00        0.00

any direction or suggestion would help, thank you

i found similar method but still cant figure it out : Regex to splitstring on date and keep it

similar example:

import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)

Upvotes: 2

Views: 94

Answers (2)

I hope this can solve your problem

import re

input_list = ['01 Feb 2023 CNY  0.00  559,929.32 **01 Feb 2023 CNY  0.00  0.00**',
 '01 Feb 2023 HKD  0.00  22,441,926.65 **01 Feb 2023 HKD  0.00  0.00**',
 '01 Feb 2023 USD  0.00  585,686.14 **01 Feb 2023 USD  0.00  0.00**']

currency_pattern = re.compile(r'(\w{3})\s+([0-9,.]+)')

output_dict = {}

for item in input_list:
    matches = re.findall(currency_pattern, item)
    matches.pop(0)
    matches.pop(1)
    if matches:
        for match in matches:
            currency = match[0]
            value = match[1]
            if currency not in output_dict:
                output_dict[currency] = []
            output_dict[currency].append(value)

new_df = pd.DataFrame.from_dict(output_dict, orient='index', columns=['value1', 'value2'])
new_df = new_df.rename_axis('currency').reset_index()
print(new_df)

Upvotes: 2

manduinca
manduinca

Reputation: 147

You can try:

import re

pattern = r"\d{2} [A-Z,a-z]{3} [0-9]{4} ([A-Z]{3})  (\d\.\d\d)"                                                                                             
string = "01 Feb 2023 CNY  0.00  559,929.32 **01 Feb 2023 CNY  0.00  0.00**"
responses = re.findall(pattern, string)

# printer
for i, response in enumerate(responses):
    if i == 0:
        print(response[0], end="")
    print(" ", response[1], end="")

Upvotes: 2

Related Questions