Py1996
Py1996

Reputation: 239

Unable to convert text format to proper data frame using Pandas

I am reading text source from URL = 'https://www.census.gov/construction/bps/txt/tb2u201901.txt'

here i used Pandas to convert it into Dataframe

df = pd.read_csv(URL, sep = '\t')

After exporting the df i see all the columns are merged into single column inspite of giving the seperator as '\t'. how to solve this issue.

enter image description here

Upvotes: 0

Views: 181

Answers (2)

Pierre-Loic
Pierre-Loic

Reputation: 1564

As your file is not a CSV file, you should use the function read_fwf() from pandas because your columns have a fixed width. You need also to remove the first 12 lines that are not part of your data and you need to remove the empty lines with dropna().

df = pd.read_fwf(URL, skiprows=12)
df.dropna(inplace=True)
df.head()

United States   94439   58086   1600    1457    33296   1263
1   Northeast   9099.0  3330.0  272.0   242.0   5255.0  242.0
2   New England     1932.0  1079.0  90.0    72.0    691.0   46.0
3   Connecticut     278.0   202.0   8.0     3.0     65.0    8.0
4   Maine   357.0   222.0   6.0     0.0     129.0   5.0
5   Massachusetts   819.0   429.0   38.0    54.0    298.0   23.0

Upvotes: 1

Vivs
Vivs

Reputation: 475

Your output is coming correct . If you open the URL , you will see that there sentences written which are not tab separated so its not able to present in correct way.
From line number 9 the results are correct

[![enter image description here][1]][1]


  [1]: https://i.sstatic.net/2K61J.png

Upvotes: 0

Related Questions