sam
sam

Reputation: 203

how to split the text using python?

f_output.write('\n{}, {}\n'.format(filename, summary))

I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?

Upvotes: 1

Views: 135

Answers (6)

Andriy Ivaneyko
Andriy Ivaneyko

Reputation: 22021

  1. Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
  2. Use os.linesep when you writing to file to have cross-browser

Code below would perform desired result(after applying steps listed above):

    import os

    filename = 'VCALogParser_output_SOME_NAME.log'
    summary = 'some summary'
    fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
    linesep = os.linesep

    f_output.write('{linesep}{fname}, {summary}{linesep}'
                   .format(fname=fname, summary=summary, linesep=linesep))

    # or if vars in execution scope strictly controlled pass locals() into format
    f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))

Upvotes: 0

Iron Fist
Iron Fist

Reputation: 10951

If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:

>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'

Or better maker your patten p then call its search method when formatting your string:

>>> p = re.compile(r'.*_(.*)\.log')
>>> 
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'

Also, it might be helpful using re.sub with positive look ahead and group naming:

>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>> 
>>> 
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>> 
>>> 
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'

In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:

>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>> 
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'

But keep in mind that the above is valid as long as you have those common strings in all of your string variables.

Upvotes: 0

MLSC
MLSC

Reputation: 5972

You can also try:

>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC

Upvotes: 0

ayyoob imani
ayyoob imani

Reputation: 639

filename.split('_')[-1].split('.')[0]

this will give you : 'ARW'

summary.split('_')[-1].split('.')[0]

and this will give you: 'CZC'

Upvotes: 1

Hannes Ovr&#233;n
Hannes Ovr&#233;n

Reputation: 21831

If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:

shortname = filename.split("_")[-1].split('.')[0]

Or, to make it (arguably) a bit more readable, we can use the os module:

shortname = os.path.splitext(filename)[0].split("_")[-1]

Upvotes: 0

Jacob H
Jacob H

Reputation: 876

fname.split('_')[-1] 

is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.

Upvotes: 0

Related Questions