Build a DataFrame out of .txt file

Question

I am trying to build a data frame out of a .txt file. The .txt file (which is a parameter I use in another program) looks like this:

a('s1','0')=1;
a('s2','0')=1;
a('s2','18')=1;
a('s3','0')=1;
a('s3','19')=1;
a('s3','21')=1;
a('s4','0')=1;
a('s4','20')=1;

Where s1, s2, s3... are routes, so route s1, route s2, route s3... for flights. And the following, '0', '18', '19' are id's of flights. So basically, route s1 is just flight '0', but route s3 goes first flight '0' followed by flight '19' and finally flight '21'.

What I want to extract from this txt file is a dataframe containing the route (s1, s2, s...) and the first and last flight operated in that route. So if route s10 has 5 flights, I only want the first and last flight of the route.

Is there a way in python or maybe pandas?

accdias · Accepted Answer

It looks like an easy task for regex:

import re
import pandas as pd

extract_route = re.compile(r"a$'(\w+)','(\d+)'$=1;")

with open('data.txt', 'r') as f:
    routes = f.readlines()

df = pd.DataFrame([extract_route.split(route)[1:3] for route in routes],
    columns=['Route', 'FlightID'])

Complementing the answer, you can now group the series by 'Route':

groups = df.groupby('Route')

With the groups in place, you can get the first and last flights using:

>>> groups.first()
      FlightID
Route         
s1           0
s2           0
s3           0
s4           0
>>> groups.last()
      FlightID
Route         
s1           0
s2          18
s3          21
s4          20
>>>

The main advantage of this approach is that you do not lose the intermediate flights information, in case you need them in the future.

Build a DataFrame out of .txt file

Answers (2)

Related Questions