Reputation: 1289

Python read data into array or list

I need to read data from a text file, manipulate it, and store it all in array or list, or some other data structure so that I can tabulate it, and plot it using matplotlib.

I intend to have an input statement, to store a treatment date and time. This dates and times in the input file will be subtracted from this datetime, to give the time since treatment (either in minutes or hours).

Firstly, the input file I am working with is in the following format:

!05/04/2014
@1332
Contact Angle (deg)     106.87
Contact Angle Left (deg)    106.90
Contact Angle Right (deg)   106.85
Wetting Tension (mN/m)      -21.13
Wetting Tension Left (mN/m) -21.16
Wetting Tension Right (mN/m)    -21.11
Base Tilt Angle (deg)       0.64
Base (mm)           1.7001
Base Area (mm2)         2.2702
Height (mm)         1.1174
Sessile Volume (ul)     2.1499
Sessile Surface Area (mm2)  6.3842
Contrast (cts)          255
Sharpness (cts)         186
Black Peak (cts)        0
White Peak (cts)        255
Edge Threshold (cts)        105
Base Left X (mm)        2.435
Base Right X (mm)       4.135
Base Y (mm)         3.801
RMS Fit Error (mm)      2.201E-3

@1333
Contact Angle (deg)     105.42
Contact Angle Left (deg)    106.04
Contact Angle Right (deg)   104.80
Wetting Tension (mN/m)      -19.36
Wetting Tension Left (mN/m) -20.12
Wetting Tension Right (mN/m)    -18.59
Base Tilt Angle (deg)       0.33
Base (mm)           1.6619
Base Area (mm2)         2.1691
Height (mm)         0.9837
Sessile Volume (ul)     1.6893
Sessile Surface Area (mm2)  5.3962
Contrast (cts)          255
Sharpness (cts)         190
Black Peak (cts)        0
White Peak (cts)        255
Edge Threshold (cts)        105
Base Left X (mm)        2.397
Base Right X (mm)       4.040
Base Y (mm)         3.753
RMS Fit Error (mm)      3.546E-3

In the file, each new date starts with an '!' and is in the format shown (dd/mm/yyyy).

The tables should contain the datetime from the input file, the contact angle and finally minutes since treatment.

The code below extracts the relevant information needed from the text file and writes it to another file, but I don't know what the best way to store the information is.

with open(infile) as f, open(outfile, 'w') as f2:
    for line in f:
        if line.split():
            if line.split()[0][0] == '!':
                for i in range(1,11):
                    current_date += (line.split()[0][i])
                f2.write(current_date[:2] + ' ' + current_date[3:5] + ' ' + current_date[6:] + '\n')
            current_date = ""
            if line.split()[0][0] == '@':
                for i in range(0,5):
                    measure_time += (line.split()[0][i])
                f2.write(measure_time[1:3] + ":" + measure_time[3:] + '\n')
            if line.split()[0] == "Contact" and line.split()[2] == "(deg)":
                contact_angle = line.split()[-1].strip()
                f2.write("Contact Angle (deg): " + contact_angle + '\n\n')
            measure_time = ""
        else:
            continue

I've been playing with datetime too, and have some code that calculates the time since the treatment from a single input, but I would need this to apply for each date and time in the input file.

from datetime import datetime
import numpy as np

dt = input("Enter treatment date and time in format: dd mm yyyy hh:mm\n")
#dt = '27 03 2014 12:06'

dob = datetime.strptime(dt,'%d %m %Y %H:%M')



b = datetime(2014,3,27,16,22,0)
c = b-dob
print(c.seconds)
print(c.seconds/60)
print(c.seconds//3600)

Finally, I would like to use matplotlib to plot the contact angle versus time since treatment.

If anyone could help me with this I would greatly appreciate it.

Upvotes: 0

Answers (2)

Oliver W.

Reputation: 13459

You clearly have records, so your data would optimally be organized like that.

Each record in your example starts with @ (and then what I'm assuming is a measurement index). Each of these records has an extra field: the date listed at the top.

records = []
record = {}
for line in f:
    kv = line.strip().split('\t')
    if kv[0].startswith('@'):
        record['measurement_date'] = msr_date
        records.append(record)  # store the last record
        record = {}  # make a new empty record
        for n in range(21):
            kv = f.next().strip().split('\t')
            quantity = kv[0].split('(')[0].strip()
            value = float(kv[1])
            record[quantity] = value
    elif kv[0].startswith('!'):
        msr_date = datetime.strptime(kv[0][1:], "%d/%m/%Y")   # append it to the record later
    else: 
        pass  # empty line
records.pop()  # The first record is a dummy record
# the last record has nog been appended yet
record['measurement_date'] = msr_date
records.append(record)

At the end, you'll end up with a list records of dictionaries. You could then cycle over these to store them in a more efficient form, for example with numpy structured arrays.

arr = np.array([ (d['Contact Angle'], d['msr_date'], d['msr_date'] - treatment_date)
    for d in records ], dtype=[
    ('contact_angle', 'f4'),
    ('msr_date', 'datetime64'),
    ('lapse_time', 'timedelta64')])

Remark that you'll have to look up if datetime64 is your needed format (have a look at this SO question for that.

With this last arr you have everything neatly placed in "columns", but you can access them by name. You could for example plot

plt.plot(arr['lapse_time'], arr['contact_angle']) but you'll have to tell matplotlib to use timedelta arguments for its independant variable, as shown here for example.

Upvotes: 1

m.wasowski

Reputation: 6387

Here is how you can parse such file. All is stored in dictionary containing dictionaries (turtles all the way down :). Main keys ar IDs (@smth).

Alternative to that would be to store by date, each item being list of dictionaries by ID. But that would be easiest to do with collections.defauldict, which would probably confuse you a bit. Thus solution below might not be the best, but should be easier to understand for you.

data = {}

date = ID = values = None

for line in datafile:
    if line.lstrip().startswith('!'):
        date = line[1:].strip()
        print date, line
    elif line.lstrip().startswith('@'):
        ID = line[1:].strip()
        data[ID] = {}
        data[ID]['date'] = date
    elif line.strip(): # line not all whitespace
        if not ID: 
            continue # we skip until we get next ID
        try:
            words = line.split()
            value = float(words[-1]) # last word
            unit = words[-2].lstrip('(').rstrip(')')
            item = {'value': value, 'unit': unit}
            key = ' '.join(words[:-2])
            data[ID][key] = item
        except (ValueError) as err:
            print "Could not parse this line:"
            print line
            continue
    else: # if 'empty' line
        ID = None

I encourage you to analyse this line, by line, looking up methods in https://docs.python.org/2/. If you really get stuck ask in comments and someone can give you a link to more specific page. GL.

Upvotes: 4

Python read data into array or list

Answers (2)

Related Questions