Jasonli1997
Jasonli1997

Reputation: 53

How to read txt into data frame

I am trying to read some web txt data into a dataframe and I kept running into "multi-char delimiter" issue. Here is a link to the data: https://ticdata.treasury.gov/Publish/mfh.txt

from urllib.request import urlretrieve
import pandas as pd

url = 'https://ticdata.treasury.gov/Publish/mfh.txt'
urlretrieve(url, 'US Treasurys Holders.txt')

names = ['Jul 2019', 'Jun 2019', 'May 2019', 'Apr 2019', 'Mar 2019', 'Feb 2019', 'Jan 2019', 'Dec 2018', 'Nov 2018',
         'Oct 2018', 'Sep 2018', 'Aug 2018', 'Jul 2018']

data = pd.read_csv('US Treasurys Holders.txt', skipfooter=21, engine='python', skiprows=11, names=names)

I used sep='\s+' but the code kept showing me:

Error could possibly be due to quotes being ignored when a multi-char delimiter is used

Upvotes: 1

Views: 63

Answers (1)

metasomite
metasomite

Reputation: 140

The issue stems from the presence of spaces within the country names for the file you're trying to open. Try using a delimiter with a minimum of two spaces:

data = pd.read_csv(
    "US Treasurys Holders.txt",
    skipfooter=21,
    engine="python",
    sep="\s\s+",
    skiprows=11,
    names=names,
)

Which gives what I think to be the correct output (here a subset of 5 cols x 5 rows):

Jul 2019    Jun 2019    May 2019    Apr 2019    Mar 2019
Japan   1130.8  1122.9  1101.0  1064.0  1078.1
China, Mainland 1110.3  1112.5  1110.2  1113.0  1120.5
United Kingdom  334.7   341.1   323.1   300.8   317.1
Brazil  309.9   311.7   305.7   306.7   311.7
Ireland 258.2   262.1   270.7   269.7   277.6

Upvotes: 1

Related Questions