Reputation: 775
Im trying to make the header that is multiple row to be as a single row header, i already tried to add the header on read_csv function but it doesnt seem to be working, there is also extra gap on the column that i display i dont know what's causing it, this is the csv im trying to make the dataframe of https://gist.github.com/ktisha/c21e73a1bd1700294ef790c56c8aec1f This is the display of the result that i code on Spyder, you can see the header is still displayed as different rows, i want it to shows as single row header [#1.... #2... #3.... etc] and is there any way to remove the [# number.] from the header? if yes, how? and then you can also see that there's a single column that is afar from the others (extra gap) that i dont want it
import csv
import pandas as pd
with open("D:\python projects\pima-indians-diabetes.csv") as f:
d = pd.read_csv(f , delimiter = ',',header = [0,1,2,3,4,5,6,7,8])
df = pd.DataFrame(d)
print (df[:4])
Upvotes: 0
Views: 107
Reputation: 771
EDIT:
so I loaded the real data now and did the following:
headers = pd.read_csv('pima-indians-diabetes.csv',nrows=9,header=None).loc[:,0].str.replace('# ','')
df = pd.read_csv('pima-indians-diabetes.csv',names=headers, delimiter=',',skiprows=9)
print(df)
1.) so for variable "headers" I only loaded the first 9 rows which contain the headers in the csv-file.
2.) I then only used column "0" and replaced the "# " there as you wanted.
3.) For the second load I skiped the first 9 rows to only load the data rows and handed the headers as the "names" parameter.
Please make sure to change the location the data is stored to your location before executing!
Loading the data two times is probably not best practice but the only way I could think of dealing with this now.
RESULT:
1. Number of times pregnant 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test 3. Diastolic blood pressure (mm Hg) 4. Triceps skin fold thickness (mm) ... 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. Class variable (0 or 1)
0 6 148 72 35 ... 33.6 0.627 50 1
1 1 85 66 29 ... 26.6 0.351 31 0
2 8 183 64 0 ... 23.3 0.672 32 1
3 1 89 66 23 ... 28.1 0.167 21 0
4 0 137 40 35 ... 43.1 2.288 33 1
5 5 116 74 0 ... 25.6 0.201 30 0
6 3 78 50 32 ... 31.0 0.248 26 1
7 10 115 0 0 ... 35.3 0.134 29 0
8 2 197 70 45 ... 30.5 0.158 53 1
9 8 125 96 0 ... 0.0 0.232 54 1
10 4 110 92 0 ... 37.6 0.191 30 0
11 10 168 74 0 ... 38.0 0.537 34 1
12 10 139 80 0 ... 27.1 1.441 57 0
13 1 189 60 23 ... 30.1 0.398 59 1
14 5 166 72 19 ... 25.8 0.587 51 1
15 7 100 0 0 ... 30.0 0.484 32 1
16 0 118 84 47 ... 45.8 0.551 31 1
17 7 107 74 0 ... 29.6 0.254 31 1
18 1 103 30 38 ... 43.3 0.183 33 0
19 1 115 70 30 ... 34.6 0.529 32 1
20 3 126 88 41 ... 39.3 0.704 27 0
21 8 99 84 0 ... 35.4 0.388 50 0
22 7 196 90 0 ... 39.8 0.451 41 1
23 9 119 80 35 ... 29.0 0.263 29 1
24 11 143 94 33 ... 36.6 0.254 51 1
25 10 125 70 26 ... 31.1 0.205 41 1
26 7 147 76 0 ... 39.4 0.257 43 1
27 1 97 66 15 ... 23.2 0.487 22 0
28 13 145 82 19 ... 22.2 0.245 57 0
29 5 117 92 0 ... 34.1 0.337 38 0
.. ... ... ... ... ... ... ... ... ...
738 2 99 60 17 ... 36.6 0.453 21 0
739 1 102 74 0 ... 39.5 0.293 42 1
740 11 120 80 37 ... 42.3 0.785 48 1
741 3 102 44 20 ... 30.8 0.400 26 0
742 1 109 58 18 ... 28.5 0.219 22 0
743 9 140 94 0 ... 32.7 0.734 45 1
744 13 153 88 37 ... 40.6 1.174 39 0
745 12 100 84 33 ... 30.0 0.488 46 0
746 1 147 94 41 ... 49.3 0.358 27 1
747 1 81 74 41 ... 46.3 1.096 32 0
748 3 187 70 22 ... 36.4 0.408 36 1
749 6 162 62 0 ... 24.3 0.178 50 1
750 4 136 70 0 ... 31.2 1.182 22 1
751 1 121 78 39 ... 39.0 0.261 28 0
752 3 108 62 24 ... 26.0 0.223 25 0
753 0 181 88 44 ... 43.3 0.222 26 1
754 8 154 78 32 ... 32.4 0.443 45 1
755 1 128 88 39 ... 36.5 1.057 37 1
756 7 137 90 41 ... 32.0 0.391 39 0
757 0 123 72 0 ... 36.3 0.258 52 1
758 1 106 76 0 ... 37.5 0.197 26 0
759 6 190 92 0 ... 35.5 0.278 66 1
760 2 88 58 26 ... 28.4 0.766 22 0
761 9 170 74 31 ... 44.0 0.403 43 1
762 9 89 62 0 ... 22.5 0.142 33 0
763 10 101 76 48 ... 32.9 0.171 63 0
764 2 122 70 27 ... 36.8 0.340 27 0
765 5 121 72 23 ... 26.2 0.245 30 0
766 1 126 60 0 ... 30.1 0.349 47 1
767 1 93 70 31 ... 30.4 0.315 23 0
FIRST ANSWER:
sample data:
0 1 2 3 4 5 6 7 8
0 NaN NaN NaN NaN NaN NaN NaN NaN first
1 NaN NaN NaN NaN NaN NaN NaN NaN second
2 NaN NaN NaN NaN NaN NaN NaN NaN thrid
3 NaN NaN NaN NaN NaN NaN NaN NaN 4
4 NaN NaN NaN NaN NaN NaN NaN NaN 5
5 NaN NaN NaN NaN NaN NaN NaN NaN 6
6 NaN NaN NaN NaN NaN NaN NaN NaN 7
7 NaN NaN NaN NaN NaN NaN NaN NaN 8
8 NaN NaN NaN NaN NaN NaN NaN NaN 9
9 d d d d d d d d d
10 d d d d d d d d d
11 d d d d d d d d d
12 d d d d d d d d d
Code:
import pandas as pd
headers = pd.read_csv('test.csv',header=None).loc[0:8,8].tolist()
df = pd.read_csv('test.csv',names=headers, skiprows=9)
print(df)
result:
first second thrid 4 5 6 7 8 9
0 d d d d d d d d d
1 d d d d d d d d d
2 d d d d d d d d d
3 d d d d d d d d d
Upvotes: 1