Reputation: 5596
In my application I have generated a number of values (three columns, of type int, str and datetime, see example below) and these values are stored in a flat file as comma-separated strings. Furthermore, I store a file containing the type of the values (see below). Now, how can I use this information to cast my values from the flat file to the correct data type in Python? Is is possible or do I need to do some other stuff?
Data file:
#id,value,date
1,a,2011-09-13 15:00:00
2,b,2011-09-13 15:10:00
3,c,2011-09-13 15:20:00
4,d,2011-09-13 15:30:00
Type file:
id,<type 'int'>
value,<type 'str'>
date,<type 'datetime.datetime'>
Upvotes: 4
Views: 9233
Reputation: 31567
As I understand, you already parsed the file, you now just need to get the right type. So let's say id_
, type_
and value
are three strings that contain the values in the file. (Note, type_
should contain 'int'
— for example —, not '<type 'int'>'
.
def convert(value, type_):
import importlib
try:
# Check if it's a builtin type
module = importlib.import_module('__builtin__')
cls = getattr(module, type_)
except AttributeError:
# if not, separate module and class
module, type_ = type_.rsplit(".", 1)
module = importlib.import_module(module)
cls = getattr(module, type_)
return cls(value)
Then you can use it like..:
value = convert("5", "int")
Unfortunately for datetime this doesnt work though, as it can not be simply initialized by its string representation.
Upvotes: 4
Reputation: 4493
You might want to look at the xlrd module. If you can load your data into excel, and it knows what type is associated with each column, xlrd will give you the type when you read the excel file. Of course, if the data is given to you as a csv then someone would have to go into the excel file and change the column types by hand.
Not sure this gets you all the way to where you want to go, but it might help
Upvotes: 0
Reputation: 69031
Your types file can be simpler:
id=int
value=str
date=datetime.datetime
Then in your main program you can
import datetime
def convert_datetime(text):
return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S")
data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime}
fields = {}
for line in open('example_types.txt').readlines():
key, val = line.strip().split('=')
fields[key] = val
data_file = open('actual_data.txt')
field_info = data_file.readline().strip('#\n ').split(',')
values = [] #store it all here for now
for line in data_file.readlines():
row = []
for i, element in enumerate(line.strip().split(',')):
element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime'
convert = data_types[element_type]
row.append(convert(element))
values.append(row)
# to show it working...
for row in values:
print row
Upvotes: 2
Reputation: 391848
First, you cannot write a "universal" or "smart" conversion that magically handles anything.
Second, trying to summarize a string-to-data conversion in anything other than code never seems to work out well. So rather than write a string that names the conversion, just write the conversion.
Finally, trying to write a configuration file in a domain-specific language is silly. Just write Python code. It's not much more complicated than trying to parse some configuration file.
Is is possible or do i need to do some other stuff?
Don't waste time trying to create a "type file" that's not simply Python. It doesn't help. It is simpler to write the conversion as a Python function. You can import that function as if it was your "type file".
import datetime
def convert( row ):
return dict(
id= int(row['id']),
value= str(row['value']),
date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"),
)
That's all you have in your "type file"
Now you can read (and process) your input like this.
from type_file import convert
import csv
with open( "date", "rb" ) as source:
rdr= csv.DictReader( source )
for row in rdr:
useful_row= convert( row )
in many cases i do not know the number of columns or the data type before runtime
This means you are doomed.
You must have an actual definition the file content or you cannot do any processing.
"id","value","other value"
1,23507,3
You don't know if "23507" should be an integer, a string, a postal code, or a floating-point (which omitted the period), a duration (in days or seconds) or some other more complex thing. You can't hope and you can't guess.
After getting a definition, you need to write an explicit conversion function based on the actual definition.
After writing the conversion, you need to (a) test the conversion with a simple unit test, and (b) test the data to be sure it really converts.
Then you can process the file.
Upvotes: 0
Reputation: 5898
Instead of having a separate "type" file, take your list of tuples of (id, value, date)
and just pickle
it.
Or you'll have to solve the problem of storing your string-to-type converters as text (in your "type" file), which might be a fun problem to solve, but if you're just trying to get something done, go with pickle
or cPickle
Upvotes: 0
Reputation: 13576
I had to deal with a similar situation in a recent program, that had to convert many fields. I used a list of tuples, where one element of the tuples was the conversion function to use. Sometimes it was int
or float
; sometimes it was a simple lambda
; and sometimes it was the name of a function defined elsewhere.
Upvotes: 1
Reputation: 1009
Follow these steps:
split()
with ,
as the separator.(e.g. using slices)
and make a datetime
object of the same.Upvotes: 1