Reputation: 19
I have a .txt file with the following format:
CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt
What I'd like to do is import this information then insert it into a dataframe, with each section after a '|' in a new column, and each new line a new entry. I have experience with importing .csv and well-formatted files into dataframes but have never dealt with something this messy. If you'd like the .txt file to play around with, let me know.
Thanks for the help in advance.
Upvotes: 1
Views: 57
Reputation: 210972
Assuming you have the following text file:
CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt
Solution:
df = pd.read_csv(filename, sep='|', skiprows=[1], parse_dates=['Date Filed'])
Result:
In [94]: df
Out[94]:
CIK Company Name Form Type Date Filed Filename
0 1000032 BINCH JAMES G 4 2016-11-07 edgar/data/1000032/0001209191-16-148633.txt
1 1000032 BINCH JAMES G 4 2016-12-02 edgar/data/1000032/0001209191-16-153119.txt
2 1000045 NICHOLAS FINANCIAL INC 10-Q 2016-11-09 edgar/data/1000045/0001193125-16-763849.txt
3 1000045 NICHOLAS FINANCIAL INC 4 2016-10-04 edgar/data/1000045/0001000045-16-000006.txt
In [95]: df.dtypes
Out[95]:
CIK int64
Company Name object
Form Type object
Date Filed datetime64[ns]
Filename object
dtype: object
Upvotes: 1