Reputation: 15
I have 10 files all of whcih are astropy.table.table.Table file type, all made of same six columns(mjd, filter, flux, flux_error, zp, zpsys) but have different lengths. firstly I want to convert each file to pandas.core.frame.DataFrame file type so that I can add them all into one list and use the pd.concat function to turn all the 10 files into 1 big pandas.core.frame.DataFrame file. I have tried this:
import numpy as np
import pandas as pd
from astropy.table import Table
n=10
li=[]
for i in range(0,n):
file = "training_data/%s.dat"%i # This way I can call each file automatically
data = Table.read(file, format="ascii")
data = pd.read_table(file) # I convert the file to pandas compatible
li.append(data) # I add the file into the empty list above
# now I have my list ready so I compress it into 1 file
all_data = pd.concat(li)
the problem with this method is the all the columns(6 columns) get compressed into 1 column for some reason, this makes it impossible for me to do the rest of the work.
when I check the shape of all_data I get (879, 1). and it looks like this:
all_data.head()
mjd filter flux flux_error zp zpsys
0 0.0 desg -4.386 4.679 27.5 ab
1 0.011000000005878974 desr -0.5441 2.751 27.5 ab
2 0.027000000001862645 desi 0.4547 4.627 27.5 ab
3 0.043000000005122274 desz -1.047 4.462 27.5 ab
4 13.043000000005122 desg -4.239 4.366 27.5 ab
so how I can I make a file like this but maintain my columns as separate columns?
here is an sample of some my data in file 0:
mjd filter flux flux_error zp zpsys
float64 str4 float64 float64 float64 str2
0.0 desg -4.386 4.679 27.5 ab
0.0110000 desr -0.5441 2.751 27.5 ab
0.0270000 desi 0.4547 4.627 27.5 ab
0.0430000 desz -1.047 4.462 27.5 ab
13.043000 desg -4.239 4.366 27.5 ab
13.050000 desr 4.695 3.46 27.5 ab
13.058000 desi 6.291 6.248 27.5 ab
13.074000 desz 6.412 5.953 27.5 ab
21.050000 desg 1.588 2.681 27.5 ab
21.058000 desr -0.6124 2.171 27.5 ab
Upvotes: 0
Views: 778
Reputation: 15
the solution was to include the sep in data = pd.read_table() so that it will keep each column as separate column with specifying the type of sep as "\s+"
n=10
li=[]
for i in range(0,n):
file = "training_data/%s.dat"%i # This way I can call each file automatically
data = pd.read_table(file, sep="\s+") # I convert the file to pandas compatible
li.append(data) # I add the file into the empty list above
# now I have my list ready so I compress it into 1 file
all_data = pd.concat(li)
Upvotes: 0
Reputation: 2542
It could be that Table.read()
is not able to guess the format / delimiter of your data. I'm able to read the included example (data in file 0) using Table.read(file, format='ascii', data_start=2)
into a table with 6 columns, but I'm not sure the whitespace is being captured correctly.
I am suspicious that the example data in file 0 is not literally what you are reading, because without the data_start=2
that file will show up with row 1 being "float64 str4 float64 float64 float64 str2".
One thing you can do is try Table.read(file, format='ascii', data_start=2, guess=False)
.
Upvotes: 1