Reputation: 1220
I have a .csv I am trying to read into a pandas data frame that has multiple rows of column headers, but the first row is sparsely labeled.
ex:
Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,,
,RH=0.8,,,,RH=0.9,,,
,n_=_60,,,,n_=_29,,,
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o)
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02
When I read the csv,
Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3])
print(Cosp2)
I end up with Unnamed: #_level_0 labels on all the headers first level headers that aren't explicit labeled.
RH=0.8 Unnamed: 2_level_0 Unnamed: 3_level_0 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o)
0.00061 0.0840 0.117551 0.117550
0.00122 0.0749 0.126468 0.126466
0.00183 0.0754 0.124370 0.124367
0.00244 0.0776 0.136591 0.136587
0.00305 0.0873 0.141423 0.141418
0.00366 0.0729 0.143599 0.143593
Unnamed: 4_level_0 RH=0.9 Unnamed: 6_level_0 \
nat_freq pred_sp(h2o) avrg_sp(T) avrg_sp(h2o)
0.00061 0.0864 0.128697 0.163304
0.00122 0.0770 0.090500 0.200350
0.00183 0.0776 0.085400 0.121275
0.00244 0.0799 0.054500 0.100996
0.00305 0.0898 0.075700 0.170033
0.00366 0.0750 0.100018 0.165468
Unnamed: 7_level_0 Unnamed: 8_level_0
nat_freq denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.163304 0.127553
0.00122 0.200350 0.089700
0.00183 0.121274 0.084600
0.00244 0.100994 0.054000
0.00305 0.170032 0.075000
0.00366 0.165466 0.099100
Is there a way to get pandas to propagate the level 0 labels across the unlabeled columns? I'd like something that looks like this:
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100
Upvotes: 0
Views: 102
Reputation: 863291
You can use get_level_values
with to_series
for Series
first:
a = Cosp2.columns.get_level_values(0).to_series()
print (a)
RH=0.8 RH=0.8
Unnamed: 2_level_0 Unnamed: 2_level_0
Unnamed: 3_level_0 Unnamed: 3_level_0
Unnamed: 4_level_0 Unnamed: 4_level_0
RH=0.9 RH=0.9
Unnamed: 6_level_0 Unnamed: 6_level_0
Unnamed: 7_level_0 Unnamed: 7_level_0
Unnamed: 8_level_0 Unnamed: 8_level_0
dtype: object
Then use mask
for NaN
s if startswith
Unnamed
and replace NaN
by ffill
(fillna
with method='ffill'
)
b = a.mask(a.str.startswith('Unnamed')).ffill()
print (b)
RH=0.8 RH=0.8
Unnamed: 2_level_0 RH=0.8
Unnamed: 3_level_0 RH=0.8
Unnamed: 4_level_0 RH=0.8
RH=0.9 RH=0.9
Unnamed: 6_level_0 RH=0.9
Unnamed: 7_level_0 RH=0.9
Unnamed: 8_level_0 RH=0.9
dtype: object
Last create new MultiIndex
by from_arrays
:
Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)])
print (Cosp2)
RH=0.8 \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.0840 0.117551 0.117550 0.0864
0.00122 0.0749 0.126468 0.126466 0.0770
0.00183 0.0754 0.124370 0.124367 0.0776
0.00244 0.0776 0.136591 0.136587 0.0799
0.00305 0.0873 0.141423 0.141418 0.0898
0.00366 0.0729 0.143599 0.143593 0.0750
RH=0.9
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)
0.00061 0.128697 0.163304 0.163304 0.127553
0.00122 0.090500 0.200350 0.200350 0.089700
0.00183 0.085400 0.121275 0.121274 0.084600
0.00244 0.054500 0.100996 0.100994 0.054000
0.00305 0.075700 0.170033 0.170032 0.075000
0.00366 0.100018 0.165468 0.165466 0.099100
Upvotes: 1