JRE0714
JRE0714

Reputation: 19

Need help formatting a .txt file and placing into a data frame

I have a .txt file with the following format:

CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt

What I'd like to do is import this information then insert it into a dataframe, with each section after a '|' in a new column, and each new line a new entry. I have experience with importing .csv and well-formatted files into dataframes but have never dealt with something this messy. If you'd like the .txt file to play around with, let me know.

Thanks for the help in advance.

Upvotes: 1

Views: 57

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210972

Assuming you have the following text file:

CIK|Company Name|Form Type|Date Filed|Filename
--------------------------------------------------------------------------------
1000032|BINCH JAMES G|4|2016-11-07|edgar/data/1000032/0001209191-16-148633.txt
1000032|BINCH JAMES G|4|2016-12-02|edgar/data/1000032/0001209191-16-153119.txt
1000045|NICHOLAS FINANCIAL INC|10-Q|2016-11-09|edgar/data/1000045/0001193125-16-763849.txt
1000045|NICHOLAS FINANCIAL INC|4|2016-10-04|edgar/data/1000045/0001000045-16-000006.txt

Solution:

df = pd.read_csv(filename, sep='|', skiprows=[1], parse_dates=['Date Filed'])

Result:

In [94]: df
Out[94]:
       CIK            Company Name Form Type Date Filed                                     Filename
0  1000032           BINCH JAMES G         4 2016-11-07  edgar/data/1000032/0001209191-16-148633.txt
1  1000032           BINCH JAMES G         4 2016-12-02  edgar/data/1000032/0001209191-16-153119.txt
2  1000045  NICHOLAS FINANCIAL INC      10-Q 2016-11-09  edgar/data/1000045/0001193125-16-763849.txt
3  1000045  NICHOLAS FINANCIAL INC         4 2016-10-04  edgar/data/1000045/0001000045-16-000006.txt

In [95]: df.dtypes
Out[95]:
CIK                      int64
Company Name            object
Form Type               object
Date Filed      datetime64[ns]
Filename                object
dtype: object

Upvotes: 1

Related Questions