Barak
Barak

Reputation: 61

python open csv search for pattern and strip everything else

I got a csv file 'svclist.csv' which contains a single column list as follows:

pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1
pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs

I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory and should look like that

PL5,00
PL5,01

I started the code as follow:

clean_data = []
with open('svclist.csv', 'rt') as f:
    for line in f:
        if line.__contains__('profile'):
        print(line, end='')

and I'm stuck here.

Thanks in advance for the help.

Upvotes: 1

Views: 54

Answers (3)

Nk03
Nk03

Reputation: 14949

you can use the regular expression - (PL5)[^/].{0,}([0-9]{2,2})

For explanation, just copy the regex and paste it here - 'https://regexr.com'. This will explain how the regex is working and you can make the required changes.

import re
test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1',
                    'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs']

regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})")
result = [] 
for test_string in test_string_list:
    matchArray = regex.findall(test_string)
    result.append(matchArray[0])
with open('outfile.txt', 'w') as f:
    for row in result:
        f.write(f'{str(row)[1:-1]}\n')

In the above code, I've created one empty list to hold the tuples. Then, I'm writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string. Then, I'm using formatted string to write content into 'outfile.csv'

Upvotes: 1

PonyTale
PonyTale

Reputation: 306

I made the assumption that the number is always between the two underscores. You could run something similar to this within your for-loop.

test_str = "pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1"

test_list = test_str.split("_")  # splits the string at the underscores

output = test_list[1].strip(
    "abcdefghijklmnopqrstuvwxyz" + str.swapcase("abcdefghijklmnopqrstuvwxyz"))  # removing any character

try:
    int(output) # testing if the any special characters are left
    print(f"PL5, {output}")
except ValueError:
    print(f'Something went wrong! Output is PL5,{output}')

Upvotes: 0

Ron Serruya
Ron Serruya

Reputation: 4446

You can use regex for this, (in general, when trying to extract a pattern this might be a good option)

import re
pattern = r"pf=/usr/sap/PL5/SYS/profile/PL5_.*(\d{2})"

with open('svclist.csv', 'rt') as f:
    for line in f:
        if 'profile' in line:
            last_two_numbers = pattern.findall(line)[0]
            print(f'PL5,{last_two_numbers}')

This code goes over each line, checks if "profile" is in the line (this is the same as _contains_), then extracts the last two digits according to the pattern

Upvotes: 1

Related Questions