Reputation: 795

Python: Extract alternate columns from a file

I want to extract all doubles/floats from a file. Any line looks like:

0    324.609    1    -39475.435    2     23.439    3    983.098
4    -4384.698    5    9475.405    6     2398.349    7    9800.138
...

Right now, I am building lists out of columns:

    y1 = [ line.split()[1] for line in data]
    y2 = [ line.split()[3] for line in data]
    y3 = [ line.split()[5] for line in data]
    y4 = [ line.split()[7] for line in data]

However, the index goes out of range if there is no column 7. How do I prevent this? Also, is there a better way of extracting all double (with the - sign) from a file?

Thank you.

Upvotes: 0

Answers (3)

DYZ

Reputation: 57033

You can spare yourself from the misery of parsing a mal-formatted data file by using Pandas. In the following example, I assume that the second line of the file does not have the last two columns:

import pandas as pd
data = pd.read_table("yourfile.dat", sep='\s+', header=None, index_col=None)
#   0         1  2          3  4         5    6        7
#0  0   324.609  1 -39475.435  2    23.439  3.0  983.098
#1  4 -4384.698  5   9475.405  6  2398.349  NaN      NaN

y1 = data[1].dropna().tolist()
y2 = data[3].dropna().tolist()
y3 = data[5].dropna().tolist()
y4 = data[7].dropna().tolist()
y4
#[983.0980000000001]

Upvotes: 2

Joe Patten

Reputation: 1704

You can use a try/except block when iterating over each line.

y7 = []
for line in data:
    try:
        y7.append(float(line.split()[7]))
    except:
        pass

If there is no seventh column, then it won't give you an error.

If you want to keep the order of each number (for example if you want every element in the 7th row to be the 7th elements of your lists), then you could append np.nan to your list:

y7 = []
for line in data:
    try:
        y7.append(float(line.split()[7]))
    except:
        y7.append(np.nan)

Upvotes: 0

nac001

Reputation: 795

To save alternate columns, generate a list of odd numbers.

    L = list(range(10)) 
    y1 = []
    for lines in data:
        line = lines.split()
        n = len(line)
        l = L[1:n:2]
        for i in l:
            y1.append(line[i])
    print y1

y1 is a list of all numbers in odd columns.

Upvotes: 0

Python: Extract alternate columns from a file

Answers (3)

Related Questions