WDS
WDS

Reputation: 337

Reading two Complete different dataframes from a single csv file

Essentially having trouble reading all of the contents of a single csv file. The first few lines of the csv file, contains 7 columns.The rest of the file contains 13 columns. I can read them fine, separately at different times, but i want to know if there is a way i could read them at once. Some photos of the csv file; (NOTE: You can ignore the nans created for the 1st dataframe, they wont be needed (only the first row will be use), i just showed them here for a full overview) enter image description here

enter image description here

enter image description here

Now, I have tried using pandas read_csv twice but will either give an error, or the file wasnt read properly. ie. if i read the first dataframe first using pandas, the second time i read the 2nd dataframe, it would have skipped the first few rows. ie. the dataframe would have a "Date(NZST)" staring at around 1940 instead of 1910 as is shown on the image. eg.

df1 = pd.read_csv(file,skiprows = 2, nrows = 1, delimiter = '\t',header = None)
df2 = pd.read_csv(file,skiprows = 8,delimiter = '\t')

If i do it the other way around, eg. df2 is first read before df1, it will give an EmptyDataError: No columns to parse from file when i read df1

cols = list(range(0,7))
cols = [0,1,2,3,4,5,6,7] 
df1 = pd.read_csv(file,skiprows = 2,delimiter = '\t',usecols=cols)

Some samples of my Data; https://drive.google.com/drive/folders/15PwpWIh13tyOyzFUTiE9LgrxUMm-9gh6?usp=sharing

Upvotes: 1

Views: 772

Answers (1)

jezrael
jezrael

Reputation: 862641

It is possible, but still better/simplier is read file twice in pandas if want correct set types of columns by default - not all columns to strings:

r = [0,1,3,4,5,6,7]
df2 = pd.read_csv(file,skiprows = r,  delimiter = '\t',header = None, names=range(13))
print (df2.head())
                   0           1           2        3            4        5   \
0     Woodhill Forest        1402      A64741  -36.749      174.431       30   
1             Station  Date(NZST)  Time(NZST)  Tmax(C)  Period(Hrs)  Tmin(C)   
2  -36.7490, 174.4310  1951 01 01       09:00        -            -     17.8   
3  -36.7490, 174.4310  1951 01 02       09:00     24.9           24     15.6   
4  -36.7490, 174.4310  1951 01 03       09:00     17.2           24     12.7   

            6         7            8         9          10           11    12  
0            G       NaN          NaN       NaN        NaN          NaN   NaN  
1  Period(Hrs)  Tgmin(C)  Period(Hrs)  Tmean(C)  RHmean(%)  Period(Hrs)  Freq  
2           24         -            -         -          -            -     D  
3           24         -            -         -          -            -     D  
4           24         -            -         -          -            -     D  

Another solution should be read file by lines and create 2 lists for 2 DataFrames, but again get all strings - need convert each column to integer or float or if necessary to datetimes.


file = 'wgenf - 2019-04-20T204905.009.genform1_proc'
df1 = pd.read_csv(file,skiprows = 2, nrows = 1, delimiter = '\t',header = None)
df2 = pd.read_csv(file,skiprows = 8,delimiter = '\t', na_values=['-'])
print (df1.dtypes)
0     object
1      int64
2     object
3    float64
4    float64
5      int64
6     object
7    float64
dtype: object

print (df2.dtypes)
Station           object
Date(NZST)        object
Time(NZST)        object
Tmax(C)          float64
Period(Hrs)      float64
Tmin(C)          float64
Period(Hrs).1    float64
Tgmin(C)         float64
Period(Hrs).2    float64
Tmean(C)         float64
RHmean(%)        float64
Period(Hrs).3    float64
Freq              object
dtype: object

Upvotes: 1

Related Questions