Reputation: 11
I wish to import the data found here https://www.asx.com.au/data/shortsell.txt, so that it becomes a 691x7 table. How to I tell it to recognise the different columns?
Thanks in advance
Upvotes: 1
Views: 66
Reputation: 11321
You most likely have solved your problem by now. If not, here's a suggestion: You could write some regex to read the lines, but that's way too much hassle in my opinion. It seems to me that the column-defining feature is their sizes. So you have to count a bit and the rest is straightforward. Make a first count and then visualise your result by doing something like this (that's code you're going to need later anyway):
with open('shortsell.txt', 'r') as file:
[next(file) for _ in range(5)] # Skip the first 5 rows
for line in file:
print(line[:8] + '|' + line[8:42] + '|' + line[42:56] + '|'
+ line[56:71] + '|' + line[71:90] + '|' + line[90:].rstrip())
Once you have the right sizes:
with open('shortsell.txt', 'r') as file:
[next(file) for _ in range(5)] # Skip the first 5 rows
# Read the columns parts
columns = [[line[:8].strip(), line[8:42].strip(), line[42:56].strip(),
line[56:71].strip(), line[71:90].strip(), line[90:].strip()]
for line in (next(file), next(file), next(file))]
# Join the parts
columns = [' '.join(columns[i][j] for i in range(3)).strip() for j in
range(6)]
# Read the data and cast to fitting type
data = [[line[:8].strip(),
line[8:42].strip(),
line[42:56].strip(),
int(line[56:71].strip().replace(',', '')),
int(line[71:90].strip().replace(',', '')),
float(line[90:].strip().replace(',', ''))]
for line in file]
Result:
['ASX Code',
'Company Name',
'Product/ Class',
'Reported Gross Short Sales (a) ASX + CHI-X',
'Issued Capital (b)',
'% of issued capital reported as short sold (a)/(b)']
[['360', 'LIFE360 INC.', 'CDI FORUS', 8999, 148866201, 0.0],
['3DP', 'POINTERRA LIMITED', 'FPO', 15213, 670733112, 0.0],
['4DS', '4DS MEMORY LIMITED', 'FPO', 15000, 1310693486, 0.0],
...
['ZEL', 'Z ENERGY LIMITED.', 'FPO NZX', 23255, 520476853, 0.0],
['ZLD', 'ZELIRA THERAPEUTICS LIMITED', 'FPO', 101860, 1185322966, 0.0],
['ZNO', 'ZOONO GROUP LIMITED', 'FPO', 67213, 163612707, 0.04]]
Another option would be to use Pandas' pd.read_fwf()
, a function designed to read fixed-width files:
import pandas as pd
with open('shortsell.txt', 'r') as file:
[next(file) for _ in range(5)] # Skip the first 5 rows
df = pd.read_fwf(file, header=None)
df.columns = (
df.iloc[0:3, :]
.fillna("").astype(str)
.apply(lambda col: " ".join(col.str.strip()).strip(), axis=0)
)
data = df.iloc[3:, :].values.tolist()
Upvotes: 2