Reputation: 9451
I have a very ugly data import coming from a client, in a .net
file. I have managed to transform this to a list of lists. An example of a list is gven:
['* Table: Movement one\n',
'* \n',
'$TSYS:CODE;NAME;TYPE;PCU\n',
'A;Car;PrT;1.000\n',
'Air_Bus;Airport Bus;PuT;1.000\n',
'B;Bus;PuT;1.000\n',
'C;Company Bus;PrT;2.000\n',
'CB;City Bus;PuT;1.000\n',',
'FE;Ferry;PuT;1.000\n',
'GV1;2-Axle Rigid Goods Vehicle;PrT;1.500\n',
'GV2;3/4 Axle Rigid Goods Vehicle;PrT;2.000\n',
'GV3;3/4 Axle Artic Goods Vehicle;PrT;3.000\n',
'GV4;5+ Axle Artic Goods Vehicle;PrT;3.000\n',
'IB;Intercity Bus;PuT;1.000\n',
'IN;Industry Bus;PuT;1.000\n',
'Loc;Local Bus;PuT;1.000\n',
'LR;Light Rail;PuT;1.000\n',
'R;Rail;PuT;1.000\n',
'S;School Bus;PrT;2.000\n',
'T;Taxi;PrT;1.100\n',
'TR;Tram;PuT;1.000\n',
'W;Walk;PrT;0.000\n',
'WB;WaterBus;PuT;1.000\n',
'WT;Water Taxi;PuT;1.000\n',
'W_PuT;Walk_PuT;PuTWalk;1.000\n',
'\n',
'* \n']
I wish to load this into a pandas dataframe.
The top two lines and bottom two lines may be discarded. Each list contains a string record, with ;
separators. I know that the separator function for read_csv
exists, but this won't work here as I am not reading from a file at this point. The column headings are also complex. The first $TSYS
record must be discarded and the remaining used as column names. I can use strip
to remove the \n
in each record.
I have tried to simply load as a dataframe:
results_df = pd.DataFrame(results[2:-2])
print(results_df.head())
0
0 $TSYS:CODE;NAME;TYPE;PCU\n
1 A;Car;PrT;1.000\n
3 Air_Bus;Airport Bus;PuT;1.000\n
4 B;Bus;PuT;1.000\n
Since I have many of these lists, how do I programtically take the 3rd line, remove the first string and create column headers from the remaining? How do I correctly separate by the ;
for each subsequent record?
Upvotes: 1
Views: 113
Reputation: 863166
You can use list comprehension
where remove \n
by strip
and split
values:
results_df = pd.DataFrame([x.strip().split(';') for x in results[3:-2]])
results_df.columns = results[2].strip().split(';')
print(results_df)
$TSYS:CODE NAME TYPE PCU
0 A Car PrT 1.000
1 Air_Bus Airport Bus PuT 1.000
2 B Bus PuT 1.000
3 C Company Bus PrT 2.000
4 CB City Bus PuT 1.000
5 FE Ferry PuT 1.000
6 GV1 2-Axle Rigid Goods Vehicle PrT 1.500
7 GV2 3/4 Axle Rigid Goods Vehicle PrT 2.000
8 GV3 3/4 Axle Artic Goods Vehicle PrT 3.000
9 GV4 5+ Axle Artic Goods Vehicle PrT 3.000
10 IB Intercity Bus PuT 1.000
11 IN Industry Bus PuT 1.000
12 Loc Local Bus PuT 1.000
13 LR Light Rail PuT 1.000
14 R Rail PuT 1.000
15 S School Bus PrT 2.000
16 T Taxi PrT 1.100
17 TR Tram PuT 1.000
18 W Walk PrT 0.000
19 WB WaterBus PuT 1.000
20 WT Water Taxi PuT 1.000
21 W_PuT Walk_PuT PuTWalk 1.000
Upvotes: 1