Reputation: 583
I have a CSV file that is formatted like below.
@QWERTY
@Equipment01
@Datetime;A;B;C;D
21/02/2005 17:55;23;451;42;31;
21/02/2005 17:50;24;143;24;54;
21/02/2005 17:45;25;513;31;31;
@Equipment02
@Datetime;A;B;C;D
21/02/2005 17:55;43;1;42;58;
21/02/2005 17:50;14;3;65;51;
21/02/2005 17:45;3;3;91;53;
21/02/2005 17:40;31;35;13;31;
21/02/2005 17:35;34;54;61;5;
@PersonalGear01
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;
@Equipment00
@Datetime;A;B;C;D
@PersonalGear02
@Datetime;A;B;C;D;E;F
21/02/2005 17:55;41;23;2;16;0;6;
21/02/2005 17:50;3;95;51;14;0;6;
21/02/2005 17:45;3;2;91;53;0;6;
Each equipment and personal gear will have delimiter datetime data rows. In some cases, there may be no datetime data row (e.g @Equipment00
). The number of datetime entries recorded may vary (e.g @Equipment02
has more datetime entries than @Equipment01
).
I will like to create multiple dataframes, based on the equipment and personal gears. The expected results based on the above example will be 4 dataframes (@Equipment01
, @Equipment02
, @PersonalGear01
, @Equipment00
).
Is there a pandas way of doing this?
Upvotes: 0
Views: 526
Reputation: 120499
You can use:
dfs = {}
with open('data.dat') as fp:
next(fp) # skip first line
data = []
name = next(fp)[1:].strip()
for row in fp:
# Parse column names
if row.startswith('@'):
headers = row[1:].strip().split(';')
# Accumulate data
else:
while not row.startswith('@'):
data.append(row.strip().split(';'))
row = next(fp)
dfs[name] = pd.DataFrame(data, columns=headers)
data = []
name = row[1:].strip()
dfs[name] = pd.DataFrame(data, columns=headers)
Output:
>>> dfs
{'Equipment01': Datetime A B C D
0 21/02/2005 17:55 23 451 42 31
1 21/02/2005 17:50 24 143 24 54
2 21/02/2005 17:45 25 513 31 31,
'Equipment02': Datetime A B C D
0 21/02/2005 17:55 43 1 42 58
1 21/02/2005 17:50 14 3 65 51
2 21/02/2005 17:45 3 3 91 53
3 21/02/2005 17:40 31 35 13 31
4 21/02/2005 17:35 34 54 61 5,
'PersonalGear01': Datetime A B C D E F
0 21/02/2005 17:55 41 23 2 16 0 6
1 21/02/2005 17:50 3 95 51 14 0 6
2 21/02/2005 17:45 3 2 91 53 0 6,
'Equipment00': Empty DataFrame
Columns: [Datetime, A, B, C, D]
Index: []}
>>> dfs.keys()
dict_keys(['Equipment01', 'Equipment02', 'PersonalGear01', 'Equipment00'])
>>> dfs['Equipment02']
Datetime A B C D
0 21/02/2005 17:55 43 1 42 58
1 21/02/2005 17:50 14 3 65 51
2 21/02/2005 17:45 3 3 91 53
3 21/02/2005 17:40 31 35 13 31
4 21/02/2005 17:35 34 54 61 5
Upvotes: 2