Reputation: 25
I have the following data in a single txt file:
00001+++00001 000031 12.8600 -1 7 BEAR
1990052418 276.00 18.80 0.00 12.86
1990052500 276.00 19.70 0.00 12.86
00002+++00002 000047 30.8700 995 22 LION
1990072206 318.10 8.80 1010.00 12.86
1990072212 316.80 8.90 1010.00 12.86
1990072218 315.40 9.30 1010.00 12.86
1990072300 313.60 9.30 1010.00 12.86
00003+++00003 000050 36.0100 973 37 BIRD
1990072412 285.00 34.00 1012.00 10.29
1990072418 286.20 33.60 1013.00 10.29
1990072500 287.30 33.00 1013.00 10.29
How can I read each (unequal) chunk under each header separately in Python (Pandas)?
Upvotes: 1
Views: 630
Reputation: 375675
Here's a solution which parses each block (without reading in the entire file):
def read_fwfs(text_file):
res = {}
with open(text_file) as f:
header = f.readline()
block = []
for line in f:
if "+++" in line:
# End of block, so save it to dict as DataFrame.
# Note: this next line could be pulled out as a function.
res[header.split("+++")[0]] = pd.read_fwf(StringIO("".join(block)), header=None)
# Reset variables.
header = line
block = []
# Ignore blank lines.
elif line != "\n":
block.append(line)
# Save the last block.
# Note: See what I mean about being a function? Here it is again:
res[header.split("+++")[0]] = pd.read_fwf(StringIO("".join(block)), header=None)
return res
In [11]: d = read_fwfs("my_text_file.txt")
d
Out[11]:
{'00001': 0 1 2 3 4
0 1990052418 276 18.8 0 12.86
1 1990052500 276 19.7 0 12.86,
'00002': 0 1 2 3 4
0 1990072206 318.1 8.8 1010 12.86
1 1990072212 316.8 8.9 1010 12.86
2 1990072218 315.4 9.3 1010 12.86
3 1990072300 313.6 9.3 1010 12.86,
'00003': 0 1 2 3 4
0 1990072412 285.0 34.0 1012 10.29
1 1990072418 286.2 33.6 1013 10.29
2 1990072500 287.3 33.0 1013 10.29}
In [12]: d["00003"]
Out[12]:
0 1 2 3 4
0 1990072412 285.0 34.0 1012 10.29
1 1990072418 286.2 33.6 1013 10.29
2 1990072500 287.3 33.0 1013 10.29
If you want to do something more with the splitting line, e.g. use it as a header, you could add that into the read_fwf part e.g. by adding the second half of the split header to the joined block. As I mention inline, it's probably a good idea to pull that part out as a function anyway.
Upvotes: 2