.dat file import in pandas

Question

I want to import this publicly available file using pandas. Simply as csv (I have renamed simply .dat to .csv):

clinton = pd.read_csv("C:/Users/Mateusz/Downloads/ML_DS-20180523T193457Z-001/ML_DS/clinton1.csv")

However in some cases country name is composed of two words, not just one. In those cases shifts my data frame to the right. This looks like (name hot springs is in two columns): How to fix it for the entire dataset at once?

Scott Boston · Accepted Answer

No need to rename the .dat to .csv. Instead you can use a regex that matches two or more spaces as a column separator.

Try use sep parameter:

pd.read_csv('http://users.stat.ufl.edu/~winner/data/clinton1.dat',
            header=None, sep='\s\s+', engine='python')

Output:

            0      1     2      3      4     5      6      7     8     9    10
0  Autauga, AL  30.92  31.7  57623  15768  15.2  10.74  51.41  60.4  2.36  457
1  Baldwin, AL  26.24  35.5  84935  16954  13.6   9.73  51.34  66.5  5.40  282
2  Barbour, AL  46.36  32.8  83656  15532  25.0   8.82  53.03  28.8  7.02   47
3   Blount, AL  32.92  34.5  61249  14820  15.0   9.67  51.15  62.4  2.36  185
4  Bullock, AL  67.67  31.7  75725  11120  33.0   7.08  50.76  17.6  2.91  141

If you want your state as a seperate column you can use this sep='\s\s+|,' which means seperate columns on two spaces or more OR a comma.

pd.read_csv('http://users.stat.ufl.edu/~winner/data/clinton1.dat',
            header=None, sep='\s\s+|,', engine='python')

Output:

        0    1      2     3      4        5     6      7      8     9     10     11
0  Autauga   AL  30.92  31.7  57623  15768.0  15.2  10.74  51.41  60.4  2.36  457.0
1  Baldwin   AL  26.24  35.5  84935  16954.0  13.6   9.73  51.34  66.5  5.40  282.0
2  Barbour   AL  46.36  32.8  83656  15532.0  25.0   8.82  53.03  28.8  7.02   47.0
3   Blount   AL  32.92  34.5  61249  14820.0  15.0   9.67  51.15  62.4  2.36  185.0
4  Bullock   AL  67.67  31.7  75725  11120.0  33.0   7.08  50.76  17.6  2.91  141.0

.dat file import in pandas

Answers (2)

Related Questions