Reputation: 189
I have two txt files...Structure first one is:
:Data_list: 20160203_Gs_xr_1m.txt
:Created: 2016 Feb 04 0010 UTC
# Prepared by the U.S. Dept. of Commerce, NOAA, Space Weather Prediction Center
# Please send comments and suggestions to [email protected]
#
# Label: Short = 0.05- 0.4 nanometer
# Label: Long = 0.1 - 0.8 nanometer
# Units: Short = Watts per meter squared
# Units: Long = Watts per meter squared
# Source: GOES-13
# Location: W075
# Missing data: -1.00e+05
#
# 1-minute GOES-13 Solar X-ray Flux
#
# Modified Seconds
# UTC Date Time Julian of the
# YR MO DA HHMM Day Day Short Long
#-------------------------------------------------------
2016 02 03 0000 57421 0 2.13e-09 4.60e-07
2016 02 03 0001 57421 60 1.84e-09 4.51e-07
2016 02 03 0002 57421 120 1.79e-09 4.52e-07
2016 02 03 0003 57421 180 1.58e-09 4.58e-07
2016 02 03 0004 57421 240 2.51e-09 4.56e-07
2016 02 03 0005 57421 300 4.30e-09 4.48e-07
2016 02 03 0006 57421 360 1.97e-09 4.47e-07
2016 02 03 0007 57421 420 2.46e-09 4.47e-07
2016 02 03 0008 57421 480 3.10e-09 4.51e-07
2016 02 03 0009 57421 540 3.24e-09 4.43e-07
2016 02 03 0010 57421 600 2.92e-09 4.34e-07
2016 02 03 0011 57421 660 2.42e-09 4.35e-07
2016 02 03 0012 57421 720 2.90e-09 4.40e-07
2016 02 03 0013 57421 780 1.87e-09 4.36e-07
2016 02 03 0014 57421 840 1.31e-09 4.37e-07
2016 02 03 0015 57421 900 2.50e-09 4.41e-07
2016 02 03 0016 57421 960 1.52e-09 4.42e-07
2016 02 03 0017 57421 1020 1.36e-09 4.41e-07
2016 02 03 0018 57421 1080 1.33e-09 4.35e-07
2016 02 03 0019 57421 1140 2.20e-09 4.37e-07
2016 02 03 0020 57421 1200 2.90e-09 4.53e-07
2016 02 03 0021 57421 1260 1.39e-09 4.75e-07
2016 02 03 0022 57421 1320 4.55e-09 4.67e-07
2016 02 03 0023 57421 1380 2.30e-09 4.58e-07
2016 02 03 0024 57421 1440 3.99e-09 4.53e-07
2016 02 03 0025 57421 1500 3.93e-09 4.40e-07
2016 02 03 0026 57421 1560 1.70e-09 4.34e-07
.
.
.
2016 02 03 2344 57421 85440 3.77e-09 5.00e-07
2016 02 03 2345 57421 85500 3.76e-09 4.96e-07
2016 02 03 2346 57421 85560 1.64e-09 4.97e-07
2016 02 03 2347 57421 85620 3.59e-09 5.04e-07
2016 02 03 2348 57421 85680 2.55e-09 5.04e-07
2016 02 03 2349 57421 85740 2.30e-09 5.11e-07
2016 02 03 2350 57421 85800 2.95e-09 5.09e-07
2016 02 03 2351 57421 85860 4.25e-09 5.02e-07
2016 02 03 2352 57421 85920 4.78e-09 5.02e-07
2016 02 03 2353 57421 85980 3.04e-09 5.01e-07
2016 02 03 2354 57421 86040 3.30e-09 5.10e-07
2016 02 03 2355 57421 86100 2.22e-09 5.16e-07
2016 02 03 2356 57421 86160 4.12e-09 5.15e-07
2016 02 03 2357 57421 86220 4.25e-09 5.16e-07
2016 02 03 2358 57421 86280 3.48e-09 5.20e-07
2016 02 03 2359 57421 86340 4.19e-09 5.27e-07
And second one:
:Data_list: 20160204_Gs_xr_1m.txt
:Created: 2016 Feb 04 1301 UTC
# Prepared by the U.S. Dept. of Commerce, NOAA, Space Weather Prediction Center
# Please send comments and suggestions to [email protected]
#
# Label: Short = 0.05- 0.4 nanometer
# Label: Long = 0.1 - 0.8 nanometer
# Units: Short = Watts per meter squared
# Units: Long = Watts per meter squared
# Source: GOES-13
# Location: W075
# Missing data: -1.00e+05
#
# 1-minute GOES-13 Solar X-ray Flux
#
# Modified Seconds
# UTC Date Time Julian of the
# YR MO DA HHMM Day Day Short Long
#-------------------------------------------------------
2016 02 04 0000 57422 0 4.85e-09 5.28e-07
2016 02 04 0001 57422 60 3.07e-09 5.29e-07
2016 02 04 0002 57422 120 4.48e-09 5.26e-07
2016 02 04 0003 57422 180 3.21e-09 5.17e-07
2016 02 04 0004 57422 240 4.23e-09 5.18e-07
2016 02 04 0005 57422 300 4.55e-09 5.21e-07
2016 02 04 0006 57422 360 3.30e-09 5.31e-07
2016 02 04 0007 57422 420 5.29e-09 5.49e-07
2016 02 04 0008 57422 480 3.14e-09 5.65e-07
2016 02 04 0009 57422 540 6.59e-09 5.70e-07
2016 02 04 0010 57422 600 6.04e-09 5.62e-07
2016 02 04 0011 57422 660 5.31e-09 5.62e-07
2016 02 04 0012 57422 720 6.04e-09 5.46e-07
2016 02 04 0013 57422 780 6.81e-09 5.51e-07
2016 02 04 0014 57422 840 6.59e-09 5.65e-07
2016 02 04 0015 57422 900 5.81e-09 5.62e-07
2016 02 04 0016 57422 960 4.63e-09 5.59e-07
2016 02 04 0017 57422 1020 3.05e-09 5.51e-07
2016 02 04 0018 57422 1080 3.26e-09 5.46e-07
2016 02 04 0019 57422 1140 4.59e-09 5.50e-07
2016 02 04 0020 57422 1200 3.38e-09 5.39e-07
2016 02 04 0021 57422 1260 2.43e-09 5.37e-07
2016 02 04 0022 57422 1320 5.31e-09 5.60e-07
2016 02 04 0023 57422 1380 5.63e-09 5.51e-07
2016 02 04 0024 57422 1440 5.18e-09 5.50e-07
2016 02 04 0025 57422 1500 7.06e-09 5.59e-07
2016 02 04 0026 57422 1560 5.01e-09 5.76e-07
2016 02 04 0027 57422 1620 7.17e-09 5.63e-07
2016 02 04 0028 57422 1680 5.74e-09 5.58e-07
2016 02 04 0029 57422 1740 5.55e-09 5.62e-07
2016 02 04 0030 57422 1800 4.99e-09 5.47e-07
2016 02 04 0031 57422 1860 5.49e-09 5.42e-07
2016 02 04 0032 57422 1920 2.14e-09 5.32e-07
2016 02 04 0033 57422 1980 2.48e-09 5.21e-07
2016 02 04 0034 57422 2040 4.35e-09 5.18e-07
2016 02 04 0035 57422 2100 4.84e-09 5.13e-07
2016 02 04 0036 57422 2160 3.12e-09 5.05e-07
2016 02 04 0037 57422 2220 1.18e-09 4.99e-07
2016 02 04 0038 57422 2280 1.59e-09 4.95e-07
Now I need create Pandas dataframe and name three columns...First-time that will be YYYY MM DD HHMM, second xra-penultimate column and xrb-last column...and I need find max of xrb with time ... I think that I know how find max with index with pandas but I dont know how create pandas dataframe...I have problem with 'Header' to 19th line...I need create dataframe from two files without header - only data. And is there any method how read data from some time to some time (time range)? Thanks for help
edit: I have this script:
import urllib2
import sys
import datetime
import pandas as pd
xray_flux = urllib2.urlopen('ftp://ftp.swpc.noaa.gov/pub/lists/xray/'+date+'_Gp_xr_1m.txt')
flux=xray_flux.read()
dataflux= open('xray_flux.txt','w')
dataflux.write(flux)
dataflux.close()
a=pd.read_csv("xray_flux.txt",header=None, sep=" ",error_bad_lines=False,skiprows=19)
print a
df=pd.DataFrame(a)
print df
Upvotes: 1
Views: 137
Reputation: 863741
You can try read_csv
and concat
:
dateparse = lambda x: pd.datetime.strptime(x, '%Y %m %d %H%M')
#df1 = pd.read_csv('ftp://ftp.swpc.noaa.gov/pub/lists/xray/'+date+'_Gp_xr_1m.txt',
#after testing replace io.StringIO(temp) to filename
df1 = pd.read_csv(io.StringIO(temp1),
sep="\s+",
index_col=None,
skiprows=19,
parse_dates={'datetime': [0,1,2,3]},
header=None,
date_parser=dateparse)
print df1.head()
datetime 4 5 6 7
0 2016-02-03 00:00:00 57421 0 2.130000e-09 4.600000e-07
1 2016-02-03 00:01:00 57421 60 1.840000e-09 4.510000e-07
2 2016-02-03 00:02:00 57421 120 1.790000e-09 4.520000e-07
3 2016-02-03 00:03:00 57421 180 1.580000e-09 4.580000e-07
4 2016-02-03 00:04:00 57421 240 2.510000e-09 4.560000e-07
#df = pd.read_csv('ftp://ftp.swpc.noaa.gov/pub/lists/xray/'+date+'_Gp_xr_1m.txt',
#after testing replace io.StringIO(temp) to filename
df2 = pd.read_csv(io.StringIO(temp),
sep="\s+",
index_col=None,
skiprows=19,
parse_dates={'datetime': [0,1,2,3]},
header=None,
date_parser=dateparse)
print df2.head()
datetime 4 5 6 7
0 2016-02-04 00:00:00 57422 0 4.850000e-09 5.280000e-07
1 2016-02-04 00:01:00 57422 60 3.070000e-09 5.290000e-07
2 2016-02-04 00:02:00 57422 120 4.480000e-09 5.260000e-07
3 2016-02-04 00:03:00 57422 180 3.210000e-09 5.170000e-07
4 2016-02-04 00:04:00 57422 240 4.230000e-09 5.180000e-07
df = pd.concat([df1[['datetime',6,7]],df2[['datetime',6,7]]])
df.columns = ['datetime','xra','xrb']
print df.head(10)
datetime xra xrb
0 2016-02-03 00:00:00 2.130000e-09 4.600000e-07
1 2016-02-03 00:01:00 1.840000e-09 4.510000e-07
2 2016-02-03 00:02:00 1.790000e-09 4.520000e-07
3 2016-02-03 00:03:00 1.580000e-09 4.580000e-07
4 2016-02-03 00:04:00 2.510000e-09 4.560000e-07
5 2016-02-03 00:05:00 4.300000e-09 4.480000e-07
6 2016-02-03 00:06:00 1.970000e-09 4.470000e-07
7 2016-02-03 00:07:00 2.460000e-09 4.470000e-07
8 2016-02-03 00:08:00 3.100000e-09 4.510000e-07
9 2016-02-03 00:09:00 3.240000e-09 4.430000e-07
EDIT:
Also you can use parameter usecols
in read_csv
for filtering columns - you need only columns datetime
, 6
and 7
. Then you can use all df1
and df2
in pd.concat
:
#df1 = pd.read_csv('ftp://ftp.swpc.noaa.gov/pub/lists/xray/'+date+'_Gp_xr_1m.txt',
df1 = pd.read_csv(io.StringIO(temp1),
sep="\s+",
index_col=None,
skiprows=19,
parse_dates={'datetime': [0,1,2,3]},
header=None,
date_parser=dateparse,
usecols=[0, 1, 2, 3, 6, 7])
print df1.head()
datetime 6 7
0 2016-02-03 00:00:00 2.130000e-09 4.600000e-07
1 2016-02-03 00:01:00 1.840000e-09 4.510000e-07
2 2016-02-03 00:02:00 1.790000e-09 4.520000e-07
3 2016-02-03 00:03:00 1.580000e-09 4.580000e-07
#df2 = pd.read_csv('ftp://ftp.swpc.noaa.gov/pub/lists/xray/'+date+'_Gp_xr_1m.txt',
df2 = pd.read_csv(io.StringIO(temp),
sep="\s+",
index_col=None,
skiprows=19,
parse_dates={'datetime': [0,1,2,3]},
header=None,
date_parser=dateparse,
usecols=[0, 1, 2, 3, 6, 7])
print df2.head()
datetime 6 7
0 2016-02-04 00:00:00 4.850000e-09 5.280000e-07
1 2016-02-04 00:01:00 3.070000e-09 5.290000e-07
2 2016-02-04 00:02:00 4.480000e-09 5.260000e-07
3 2016-02-04 00:03:00 3.210000e-09 5.170000e-07
4 2016-02-04 00:04:00 4.230000e-09 5.180000e-07
df = pd.concat([df1,df2])
df.columns = ['datetime','xra','xrb']
print df
datetime xra xrb
0 2016-02-03 00:00:00 2.130000e-09 4.600000e-07
1 2016-02-03 00:01:00 1.840000e-09 4.510000e-07
2 2016-02-03 00:02:00 1.790000e-09 4.520000e-07
3 2016-02-03 00:03:00 1.580000e-09 4.580000e-07
4 2016-02-03 00:04:00 2.510000e-09 4.560000e-07
5 2016-02-03 00:05:00 4.300000e-09 4.480000e-07
6 2016-02-03 00:06:00 1.970000e-09 4.470000e-07
7 2016-02-03 00:07:00 2.460000e-09 4.470000e-07
8 2016-02-03 00:08:00 3.100000e-09 4.510000e-07
9 2016-02-03 00:09:00 3.240000e-09 4.430000e-07
Upvotes: 1