Johan R
Johan R

Reputation: 137

Pandas reading in file that has uneven column lengths

I'm trying to read in a discharge data file which looks like this:

Station number: 420
Location:       Kotagaon Shringe                                                  
Latitude: 27 45 00
River:          Kali Gandaki                                                     
Longitude: 84 20 50

Year:           2001

                                Mean daily discharge in m3/s
                                ============================

Day     Jan.   Feb.   Mar.   Apr.   May    Jun.   Jul.   Aug.   Sep.   Oct.   Nov.   Dec.   Year
 01      118   99.3   85.9   75.5    119    182    656   2790   1690    402    232    158
 02      123   97.4   82.9   74.3    134    251    514   2420   2180    397    230    158
 03      118   95.5   80.7   73.1    168    377    466   2190   2190    386    226    157
-------------------------------- Skipping some rows of no real interest
 25     95.5   85.5   70.7   83.3    163    583    898   3230    485    257    177    123
 26     94.1   88.6   69.9   84.6    167    579    996   2330    474    252    175    121
 27     92.2   88.6   71.9   88.1    166    736   1180   2270    461    248    173    120
 28     91.8   87.3   69.9   91.3    172    419   1020   2270    431    246    168    118
 29     95.5          71.9   93.2    165    446   1670   2140    410    244    163    118
 30     98.4          76.0    109    176    575   2040   2100    403    239    159    117
 31     98.4          75.1           174          3330   1600           234           117

My problem is that when using white space as a separator it does shift over the March value at day 29 since February got no day 29. And again for other places with empty/no values.

Is there a good way to work around this?

I have looked for solutions online but all I could find is dealing with uneven row length, not uneven column length.

My attempt this far has resulted in the code:

disc = pd.read_csv(filename,header = 6,sep = '\s+',nrows = 31)
disc['Year'] = 2001

With the dataframe looking like:

   Day  Jan.    Feb.    Mar.    Apr.    May Jun.    Jul.    Aug.    Sep.    Oct.    Nov.    Dec.    Year
0   1   118.0   99.3    85.9    75.5    119 182     656   2790.0    1690.0  402.0   232.0   158.0   2001
1   2   123.0   97.4    82.9    74.3    134 251     514   2420.0    2180.0  397.0   230.0   158.0   2001
2   3   118.0   95.5    80.7    73.1    168 377     466   2190.0    2190.0  386.0   226.0   157.0   2001
----------------------------------------------- Skipping some rows of no real interest
28  29  95.5    71.9    93.2    165.0   446 1670    2140    410.0   244.0   163.0   118.0   NaN     2001
29  30  98.4    76.0    109.0   176.0   575 2040    2100    403.0   239.0   159.0   117.0   NaN     2001
30  31  98.4    75.1    174.0   3330.0  1600 234     117      NaN NaN        NaN    NaN     NaN     2001

Upvotes: 2

Views: 426

Answers (1)

rahlf23
rahlf23

Reputation: 9019

You can use the pd.read_fwf() module for reading fixed-width files and leverage the skiprows keyword:

disc = pd.read_fwf('test.csv', skiprows=11)

Yields:

   Day   Jan.  Feb.  Mar.   Apr.  ...     Sep.  Oct.   Nov.  Dec.  Year
0    1  118.0  99.3  85.9   75.5  ...   1690.0   402  232.0   158   NaN
1    2  123.0  97.4  82.9   74.3  ...   2180.0   397  230.0   158   NaN
2    3  118.0  95.5  80.7   73.1  ...   2190.0   386  226.0   157   NaN
3   25   95.5  85.5  70.7   83.3  ...    485.0   257  177.0   123   NaN
4   26   94.1  88.6  69.9   84.6  ...    474.0   252  175.0   121   NaN
5   27   92.2  88.6  71.9   88.1  ...    461.0   248  173.0   120   NaN
6   28   91.8  87.3  69.9   91.3  ...    431.0   246  168.0   118   NaN
7   29   95.5   NaN  71.9   93.2  ...    410.0   244  163.0   118   NaN
8   30   98.4   NaN  76.0  109.0  ...    403.0   239  159.0   117   NaN
9   31   98.4   NaN  75.1    NaN  ...      NaN   234    NaN   117   NaN

Upvotes: 1

Related Questions