Reputation: 5495
I have to parse the following file in python:
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
I need to end upwith the following variables (first line is parsed as example):
year = 2010
month = 03
day = 22
hour = 23
minute = 24
p1 = Decimal('1.355800')
p2 = Decimal('1.355900')
p3 = Decimal('1.355800')
p4 = Decimal('1.355900')
I have tried:
line = '20100322;232400;1.355800;1.355900;1.355800;1.355900;0'
year = line[:4]
month = line[4:6]
day = line[6:8]
hour = line[9:11]
minute = line[11:13]
p1 = Decimal(line[16:24])
p2 = Decimal(line[25:33])
p3 = Decimal(line[34:42])
p4 = Decimal(line[43:51])
print(year)
print(month)
print(day)
print(hour)
print(minute)
print(p1)
print(p2)
print(p3)
print(p4)
Which works fine, but I am wondering if there is an easier way to parse this (maybe using struct) to avoid having to count each position manually.
Upvotes: 0
Views: 75
Reputation: 10809
from decimal import Decimal
from datetime import datetime
line = "20100322;232400;1.355800;1.355900;1.355800;1.355900;0"
tokens = line.split(";")
dt = datetime.strptime(tokens[0] + tokens[1], "%Y%m%d%H%M%S")
decimals = [Decimal(string) for string in tokens[2:6]]
# datetime objects also have some useful attributes: dt.year, dt.month, etc.
print(dt, *decimals, sep="\n")
Output:
2010-03-22 23:24:00
1.355800
1.355900
1.355800
1.355900
Upvotes: 2
Reputation: 1007
You could use regex:
import re
to_parse = """
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
"""
stx = re.compile(
r'(?P<date>(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2}));'
r'(?P<time>(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2}));'
r'(?P<p1>[\.\-\d]*);(?P<p2>[\.\-\d]*);(?P<p3>[\.\-\d]*);(?P<p4>[\.\-\d]*)'
)
f = [{k:float(v) if 'p' in k else int(v) for k,v in a.groupdict().items()} for a in stx.finditer(to_parse)]
print(f)
Output:
[{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 24,
'month': 3,
'p1': 1.3558,
'p2': 1.3559,
'p3': 1.3558,
'p4': 1.3559,
'second': 0,
'time': 232400,
'year': 2010},
{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 25,
'month': 3,
'p1': 1.3558,
'p2': 1.3559,
'p3': 1.3558,
'p4': 1.3559,
'second': 0,
'time': 232500,
'year': 2010},
{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 26,
'month': 3,
'p1': 1.3558,
'p2': 1.3558,
'p3': 1.3558,
'p4': 1.3558,
'second': 0,
'time': 232600,
'year': 2010}]
Here i stored everything in a list, but you could actually go through the results of finditer
line by line if you don't want to store everything in memory.
You can also replace fload and/or int with Decimal if needed
Upvotes: 0