Juan Sebastian Eljach
Juan Sebastian Eljach

Reputation: 11

Reading data from files in python with intelligent format

I have this data table:

enter image description here

Ciudad means City, Fase means Phase and Tarea mins Task

The data table is saved into a file with this format:

Giron 20 15,18 40 50 60,77 37 45
Floridablanca 17 13,17 35 43 55,67 39 46
Bogota 15 12,17 35 43 55,67 39 46
Cali 14 12,10 30 40 32,59 67 33

The numbers means millions (20 Million, 18 million, etc.)

Each city is a line. Phases are delimited by "," and tasks are delimited by space

I need to read this file from python and be able to work with the tasks and phases of every single city, calculate what is the more expensive tasks in a city, the most expensive phase, etc.

The problem is that I dont really know how to read and save the data in order to start to calculate what I need to calculate

I have been trying with 1d-arrays and 2d-arrays with Numpy (loadtxt, genfromtxt), but the data output is not so clear and I can't figure out how to work with it

Upvotes: 1

Views: 86

Answers (2)

aghast
aghast

Reputation: 15310

This is a simple parsing task, but you have to approach it in stages. First parse some, then for each bit parse a little more, etc.

Try this:

#!python3
import io

File = """Giron 20 15,18 40 50 60,77 37 45
Floridablanca 17 13,17 35 43 55,67 39 46
Bogota 15 12,17 35 43 55,67 39 46
Cali 14 12,10 30 40 32,59 67 33
"""

Ciudades = {}

with io.StringIO(File) as infile:
    for line in infile:
        if line.strip():
            ciudad,costs = line.split(' ', 1)
            Ciudades[ciudad] = fases = {}
            for fase,tareacosts in enumerate(costs.split(',')):
                fn = "Fase {}".format(fase)
                fases[fn] = list(map(int, tareacosts.split(' ')))

print("Most expensive tarea in Bogota Fase 2 is:",
        max(Ciudades['Bogota']['Fase 2']))

Upvotes: 0

Alex Hall
Alex Hall

Reputation: 36043

import re
line = 'Santa Rosa de Cabal 20 15,18 40 50 60,77 37 45'
city, phase1, phase2, phase3 = re.match(
    '(.+) (\d+ \d+),(\d+ \d+ \d+ \d+),(\d+ \d+ \d+)', line).groups()

def tasks(phase_string):
    return [int(task) for task in phase_string.split()]

print(city)
for phase in phase1, phase2, phase3:
    print(tasks(phase))

Output:

Santa Rosa de Cabal
[20, 15]
[18, 40, 50, 60]
[77, 37, 45]

The main thing here is regular expressions. Read about them.

Upvotes: 3

Related Questions