Reputation: 337
Essentially having trouble reading all of the contents of a single csv file.
The first few lines of the csv file, contains 7 columns.The rest of the file contains 13 columns. I can read them fine, separately at different times, but i want to know if there is a way i could read them at once. Some photos of the csv file; (NOTE: You can ignore the nans created for the 1st dataframe, they wont be needed (only the first row will be use), i just showed them here for a full overview)
Now, I have tried using pandas read_csv
twice but will either give an error, or the file wasnt read properly. ie. if i read the first dataframe first using pandas, the second time i read the 2nd dataframe, it would have skipped the first few rows. ie. the dataframe would have a "Date(NZST)" staring at around 1940 instead of 1910 as is shown on the image.
eg.
df1 = pd.read_csv(file,skiprows = 2, nrows = 1, delimiter = '\t',header = None)
df2 = pd.read_csv(file,skiprows = 8,delimiter = '\t')
If i do it the other way around, eg. df2
is first read before df1
, it will give an EmptyDataError: No columns to parse from file
when i read df1
Im thingking from the error given it suggests that this can be fixed (maybe), if i somehow reset the reader, but i have been searching endlessly, but cant seem to find a way.
I was also thingking of reading only 7 columns as the rest of the columns wont be needed anyway ie; neither cols below works
cols = list(range(0,7))
cols = [0,1,2,3,4,5,6,7]
df1 = pd.read_csv(file,skiprows = 2,delimiter = '\t',usecols=cols)
Some samples of my Data; https://drive.google.com/drive/folders/15PwpWIh13tyOyzFUTiE9LgrxUMm-9gh6?usp=sharing
Upvotes: 1
Views: 772
Reputation: 862641
It is possible, but still better/simplier is read file twice in pandas if want correct set types
of columns by default - not all columns to strings:
r = [0,1,3,4,5,6,7]
df2 = pd.read_csv(file,skiprows = r, delimiter = '\t',header = None, names=range(13))
print (df2.head())
0 1 2 3 4 5 \
0 Woodhill Forest 1402 A64741 -36.749 174.431 30
1 Station Date(NZST) Time(NZST) Tmax(C) Period(Hrs) Tmin(C)
2 -36.7490, 174.4310 1951 01 01 09:00 - - 17.8
3 -36.7490, 174.4310 1951 01 02 09:00 24.9 24 15.6
4 -36.7490, 174.4310 1951 01 03 09:00 17.2 24 12.7
6 7 8 9 10 11 12
0 G NaN NaN NaN NaN NaN NaN
1 Period(Hrs) Tgmin(C) Period(Hrs) Tmean(C) RHmean(%) Period(Hrs) Freq
2 24 - - - - - D
3 24 - - - - - D
4 24 - - - - - D
Another solution should be read file by lines and create 2 lists for 2 DataFrames, but again get all strings - need convert each column to integer or float or if necessary to datetimes.
file = 'wgenf - 2019-04-20T204905.009.genform1_proc'
df1 = pd.read_csv(file,skiprows = 2, nrows = 1, delimiter = '\t',header = None)
df2 = pd.read_csv(file,skiprows = 8,delimiter = '\t', na_values=['-'])
print (df1.dtypes)
0 object
1 int64
2 object
3 float64
4 float64
5 int64
6 object
7 float64
dtype: object
print (df2.dtypes)
Station object
Date(NZST) object
Time(NZST) object
Tmax(C) float64
Period(Hrs) float64
Tmin(C) float64
Period(Hrs).1 float64
Tgmin(C) float64
Period(Hrs).2 float64
Tmean(C) float64
RHmean(%) float64
Period(Hrs).3 float64
Freq object
dtype: object
Upvotes: 1