Reputation: 1584
Trying to split a string by --
character and want to print the data after --
.
Eg :
1.Cleveland-Elyria-Mentor OH--17460
2.Dallas-Plano-Irving TX (MSAD)--19100
etc.
How to print only :
a.TX (MSAD)
OH
and
b. 17460
19100
Code:
#!/usr/bin/python
import csv
import re
sample="""columnA,ColumnB,columnC
1,Cleveland-Elyria-Mentor OH--17460
2,Dallas-Plano-Irving TX (MSAD)--19100
3,ASJDFJKDJ-kD-JE WA--21092"""
open('sample.csv','w').write(sample)
with open('sample.csv') as sample, open('final_output.csv','w') as output:
reader = csv.reader(sample)
writer = csv.writer(output)
# discard input header
next(reader)
# write output header
writer.writerow(['col1','col2','col3'])
#process rows
for row in reader:
if row:
for stsplit in re.split(r'--', row[1]):
writer.writerow([row[0], stsplit, row[1]])
print open('final_output.csv').read()
Upvotes: 0
Views: 1666
Reputation: 191733
rsplit
isn't regex, so try actually using regex.
s = """1.Cleveland-Elyria-Mentor OH--17460
2.Dallas-Plano-Irving TX (MSAD)--19100"""
import re
for line in s.split('\n'):
match = re.search(r'(?P<state>\b\w{2}\b).*--(?P<zip>\d{5})$', line)
print(match.group('state'), match.group('zip'))
Output
OH 17460
TX 19100
Upvotes: 3
Reputation: 359
I hope I understand you correctly so: split
creates a list with elements between the string that you pass as a sperator, rsplit
does the same but scanning from the end of the string (which is important if you specify maxsplit
argument for example). For you the difference is not important and you can use split
, it will create a list of elements
['Cleveland-Elyria-Mentor OH', '17460']
You want the 17460
, it's the last element of the list, so the code you need is
fipsplit = row.split('--')[-1]
writer.writerow(fipsplit)
Upvotes: 0
Reputation: 31895
Taking the numeric values as an example:
import re
DOUBLE_DASH ="(?<=--)\d+"
def grab_numeric(line, pattern=DOUBLE_DASH):
result = re.search(pattern, line)
num = result.group(0) if result else None
return num
with open("sample.csv") as inputs:
for line in inputs:
result = grab_numeric(line)
print(result)
put your content into sample.csv and the code in test.py and run
python test.py
Output:
17460
19100
It is similar to grab HO
and TX
, just do some research by yourself and replace the pattern
I provide. Hope it helps
Upvotes: 3