Viv
Viv

Reputation: 1584

Print only second part of a string split

Trying to split a string by -- character and want to print the data after --.

Eg :

1.Cleveland-Elyria-Mentor    OH--17460
2.Dallas-Plano-Irving    TX  (MSAD)--19100
etc.

How to print only :

a.TX (MSAD)
  OH

and

b. 17460 
   19100 

Code:

   #!/usr/bin/python
   import csv
   import re

   sample="""columnA,ColumnB,columnC
   1,Cleveland-Elyria-Mentor    OH--17460
   2,Dallas-Plano-Irving    TX  (MSAD)--19100
   3,ASJDFJKDJ-kD-JE       WA--21092"""


   open('sample.csv','w').write(sample)
   with open('sample.csv') as sample, open('final_output.csv','w') as output:
   reader = csv.reader(sample)
   writer = csv.writer(output)
   # discard input header
   next(reader)
   # write output header
    writer.writerow(['col1','col2','col3'])
    #process rows
   for row in reader:
        if row:
                for stsplit in re.split(r'--', row[1]):
                    writer.writerow([row[0], stsplit, row[1]])


   print open('final_output.csv').read()

Upvotes: 0

Views: 1666

Answers (3)

OneCricketeer
OneCricketeer

Reputation: 191733

rsplit isn't regex, so try actually using regex.

s = """1.Cleveland-Elyria-Mentor    OH--17460
2.Dallas-Plano-Irving    TX  (MSAD)--19100"""

import re

for line in s.split('\n'):
  match = re.search(r'(?P<state>\b\w{2}\b).*--(?P<zip>\d{5})$', line)
  print(match.group('state'), match.group('zip'))

Output

OH 17460
TX 19100

Upvotes: 3

marcinowski
marcinowski

Reputation: 359

I hope I understand you correctly so: split creates a list with elements between the string that you pass as a sperator, rsplit does the same but scanning from the end of the string (which is important if you specify maxsplit argument for example). For you the difference is not important and you can use split, it will create a list of elements

['Cleveland-Elyria-Mentor    OH', '17460']

You want the 17460, it's the last element of the list, so the code you need is

fipsplit = row.split('--')[-1]
writer.writerow(fipsplit)

Upvotes: 0

Haifeng Zhang
Haifeng Zhang

Reputation: 31895

Taking the numeric values as an example:

import re
DOUBLE_DASH ="(?<=--)\d+"

def grab_numeric(line, pattern=DOUBLE_DASH):
    result = re.search(pattern, line)
    num = result.group(0) if result else None
    return num

with open("sample.csv") as inputs:
    for line in inputs:
        result = grab_numeric(line)
        print(result)

put your content into sample.csv and the code in test.py and run

python test.py  

Output:

17460
19100

It is similar to grab HO and TX, just do some research by yourself and replace the pattern I provide. Hope it helps

Upvotes: 3

Related Questions