Unable to convert text format to proper data frame using Pandas

Question

I am reading text source from URL = 'https://www.census.gov/construction/bps/txt/tb2u201901.txt'

here i used Pandas to convert it into Dataframe

df = pd.read_csv(URL, sep = '	')

After exporting the df i see all the columns are merged into single column inspite of giving the seperator as ' '. how to solve this issue.

Pierre-Loic · Accepted Answer

As your file is not a CSV file, you should use the function read_fwf() from pandas because your columns have a fixed width. You need also to remove the first 12 lines that are not part of your data and you need to remove the empty lines with dropna().

df = pd.read_fwf(URL, skiprows=12)
df.dropna(inplace=True)
df.head()

United States   94439   58086   1600    1457    33296   1263
1   Northeast   9099.0  3330.0  272.0   242.0   5255.0  242.0
2   New England     1932.0  1079.0  90.0    72.0    691.0   46.0
3   Connecticut     278.0   202.0   8.0     3.0     65.0    8.0
4   Maine   357.0   222.0   6.0     0.0     129.0   5.0
5   Massachusetts   819.0   429.0   38.0    54.0    298.0   23.0

Unable to convert text format to proper data frame using Pandas

Answers (2)

Related Questions