Reputation: 6488
I am trying to process a dataset to play with for DataScience but it does not have column names. The output of df.head()
as shown below:
1 73 Not in universe 0 0.1 0.2 Not in universe.1
0 2 58 Self-employed-not incorporated 4 34 0 Not in universe
1 3 18 Not in universe 0 0 0 High school
2 4 9 Not in universe 0 0 0 Not in universe
3 5 10 Not in universe 0 0 0 Not in universe
4 6 48 Private 40 10 1200 Not in universe
What I would like to see is
0 1 73 Not in universe 0 0.1 0.2 Not in universe.1
1 2 58 Self-employed-not incorporated 4 34 0 Not in universe
2 3 18 Not in universe 0 0 0 High school
3 4 9 Not in universe 0 0 0 Not in universe
4 5 10 Not in universe 0 0 0 Not in universe
5 6 48 Private 40 10 1200 Not in universe
I could assign random column names but is there a nicer way?
Upvotes: 10
Views: 19366
Reputation: 136
I would like you to go through this link. Default value for header is 'infer' which means it will automatically set the integer values for the data if not specify.
Also you can set the different column names by setting names parameter which takes an array, list of column names.
Upvotes: 0
Reputation: 393933
You loaded the file without specifying whether it had a header row or not, by default it infers it from the first row, if it's missing then pass header=None
:
df = pd.read_csv(file_path, header=None)
Upvotes: 21