Reputation: 153
I want to import this publicly available file using pandas. Simply as csv (I have renamed simply .dat to .csv):
clinton = pd.read_csv("C:/Users/Mateusz/Downloads/ML_DS-20180523T193457Z-001/ML_DS/clinton1.csv")
However in some cases country name is composed of two words, not just one. In those cases shifts my data frame to the right. This looks like (name hot springs is in two columns): How to fix it for the entire dataset at once?
Upvotes: 2
Views: 17336
Reputation: 45
You can use a regular expression as a separator. In your specific case, all the delimiters are more than one space whereas the spaces in the names are just single spaces.
import pandas as pd
clinton = pd.read_csv("clinton1.csv", sep='\s{2,}', header=None, engine='python')
Upvotes: 0
Reputation: 153460
No need to rename the .dat to .csv. Instead you can use a regex that matches two or more spaces as a column separator.
Try use sep
parameter:
pd.read_csv('http://users.stat.ufl.edu/~winner/data/clinton1.dat',
header=None, sep='\s\s+', engine='python')
Output:
0 1 2 3 4 5 6 7 8 9 10
0 Autauga, AL 30.92 31.7 57623 15768 15.2 10.74 51.41 60.4 2.36 457
1 Baldwin, AL 26.24 35.5 84935 16954 13.6 9.73 51.34 66.5 5.40 282
2 Barbour, AL 46.36 32.8 83656 15532 25.0 8.82 53.03 28.8 7.02 47
3 Blount, AL 32.92 34.5 61249 14820 15.0 9.67 51.15 62.4 2.36 185
4 Bullock, AL 67.67 31.7 75725 11120 33.0 7.08 50.76 17.6 2.91 141
If you want your state as a seperate column you can use this sep='\s\s+|,' which means seperate columns on two spaces or more OR a comma.
pd.read_csv('http://users.stat.ufl.edu/~winner/data/clinton1.dat',
header=None, sep='\s\s+|,', engine='python')
Output:
0 1 2 3 4 5 6 7 8 9 10 11
0 Autauga AL 30.92 31.7 57623 15768.0 15.2 10.74 51.41 60.4 2.36 457.0
1 Baldwin AL 26.24 35.5 84935 16954.0 13.6 9.73 51.34 66.5 5.40 282.0
2 Barbour AL 46.36 32.8 83656 15532.0 25.0 8.82 53.03 28.8 7.02 47.0
3 Blount AL 32.92 34.5 61249 14820.0 15.0 9.67 51.15 62.4 2.36 185.0
4 Bullock AL 67.67 31.7 75725 11120.0 33.0 7.08 50.76 17.6 2.91 141.0
Upvotes: 9