user3739039
user3739039

Reputation: 3

Dealing with a list of lists in Python and selecting a subset of lists

Note: I'm using Python to read in this file.

I currently have a data file that is arranged as such:

1 0.1803 233.650000 101.52010 37.95730 96.41869
0.462300 1.425000e+12 1.811000e+12 1.710841e+10
0.456300 1.811000e+12 1.811000e+12 1.711282e+10
0.450300 9.443000e+11 9.443000e+11 9.842220e+09
0.444300 7.089000e+11 7.089000e+11 6.764462e+09

0 0.2523 462.060000 96.47176 48.58004 84.13097
0.456300 1.325000e+13 1.325000e+13 7.735244e+10
0.450300 1.283000e+13 1.283000e+13 7.684167e+10
0.444300 1.182000e+13 1.182000e+13 7.571757e+10
0.438300 1.002000e+13 1.002000e+13 7.352358e+10
0.432300 8.971000e+12 8.971000e+12 7.196254e+10

1 0.0000 74.230000 81.10059 46.28531 95.17891
0.342300 2.862000e+10 3.803000e+10 9.795136e+06

0 0.9493 776.060000 98.65339 41.54604 94.64194
1.000300 1.467000e+14 1.674000e+14 1.279873e+11
0.997300 1.467000e+14 1.674000e+14 1.280501e+11
0.994300 1.476000e+14 1.674000e+14 1.281122e+11

Essentially the data is a large list of lists where each list is separated by a blank space. The first line of every list has 6 columns, and the subsequent lines all have 4 columns. The length of each list varies. I would like to be able to select only lists that fulfill certain criteria. For example, I would select only lists with a value of 0 for the first element of the first row of every list, so it would only pick the 2nd and 4th lists in the example data I gave above.

My idea for a solution: I would select only the first lines of every list and make a separate array of these values. Then I could find the indices where the first element is 0 using the where() function. Then I would select the lists that correspond to these indices.

The problem is I don't know how to deal with blank lines in my data. I'm not sure how to index lists separated by blank lines, nor do I know how to select only those rows of data that occur after a blank line. Anyone have any ideas as to how to implement my solution or does anybody have any other solutions? Thanks in advance.

Upvotes: 0

Views: 3567

Answers (4)

Jan Vlcinsky
Jan Vlcinsky

Reputation: 44112

groupby number of items in a row > 0

Assuming, you want to get list of lists:

>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
...     reader = csv.reader(f, delimiter=" ")
...     res = [list(items) for group, items in groupby(reader, key=grouper) if group]
...
>>> res
[[['1', '0.1803', '233.650000', '101.52010', '37.95730', '96.41869'],
  ['0.462300', '1.425000e+12', '1.811000e+12', '1.710841e+10'],
  ['0.456300', '1.811000e+12', '1.811000e+12', '1.711282e+10'],
  ['0.450300', '9.443000e+11', '9.443000e+11', '9.842220e+09'],
  ['0.444300', '7.089000e+11', '7.089000e+11', '6.764462e+09']],
 [['0', '0.2523', '462.060000', '96.47176', '48.58004', '84.13097'],
  ['0.456300', '1.325000e+13', '1.325000e+13', '7.735244e+10'],
  ['0.450300', '1.283000e+13', '1.283000e+13', '7.684167e+10'],
  ['0.444300', '1.182000e+13', '1.182000e+13', '7.571757e+10'],
  ['0.438300', '1.002000e+13', '1.002000e+13', '7.352358e+10'],
  ['0.432300', '8.971000e+12', '8.971000e+12', '7.196254e+10']],
 [['1', '0.0000', '74.230000', '81.10059', '46.28531', '95.17891'],
  ['0.342300', '2.862000e+10', '3.803000e+10', '9.795136e+06']],
 [['0', '0.9493', '776.060000', '98.65339', '41.54604', '94.64194'],
  ['1.000300', '1.467000e+14', '1.674000e+14', '1.279873e+11'],
  ['0.997300', '1.467000e+14', '1.674000e+14', '1.280501e+11'],
  ['0.994300', '1.476000e+14', '1.674000e+14', '1.281122e+11']]]

The function grouper takes as an argument record (csv.reader provides a list of numbers) and returns True, if the list is not empty, and False, if it has no item.

If you group by this value, you get groups sepearted by empty lines.

The only remaining step is to get rid of those small groups, which are resulting from empty line. List comprehension allows filtering by final if <condition> statement. Here we can reuse the True or False provided by groupby.

groupby from itertools takes as first argument an iterable, and key arguments defines a callable, calculating from particular items grouping value. As soon as the grouping value changes, new group is yielded.

groupby yields tuple, first item being the value shaping the group (True or False), second being iterable with all the items within that group.

Converting strings to floats

In case, you want to have the numbers read as floats, we can define a function floater, which accepts as argument an item from res and applies float on all the sublists:

def floater(lstlst):
    return [map(float, items) for items in lstlst]

Then the solution will look like:

>>> import csv
>>> from itertools import groupby
>>> grouper = lambda rec: len(rec) > 0
>>> with open("data.txt") as f:
...     reader = csv.reader(f, delimiter=" ")
...     res = [floater(items) for group, items in groupby(reader, key=grouper) if group]
>>> res
[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
  [0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
  [0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
  [0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
  [0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
 [[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
  [0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

Upvotes: 3

beetea
beetea

Reputation: 308

List comprehensions make this very easy:

>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]
>>> result = [group for group in data if group[0][0] == 0]

First, let's parse this into something we can easily access programmatically.

A list of lists of lists seems reasonable to me, and something like the following would be ideal:

[[[1.0, 0.1803, 233.65, 101.5201, 37.9573, 96.41869],
  [0.4623, 1425000000000.0, 1811000000000.0, 17108410000.0],
  [0.4563, 1811000000000.0, 1811000000000.0, 17112820000.0],
  [0.4503, 944300000000.0, 944300000000.0, 9842220000.0],
  [0.4443, 708900000000.0, 708900000000.0, 6764462000.0]],
 [[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[1.0, 0.0, 74.23, 81.10059, 46.28531, 95.17891],
  [0.3423, 28620000000.0, 38030000000.0, 9795136.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

To do this, we can use list comprehensions:

>>> s = open(yourfile).read()
>>> data = [[map(float, row) for row in map(str.split, sublist)] for sublist in (group.split('\n') for group in s.split('\n\n'))]

This list comprehension is easiest to read from right to left:

  1. First, we split the input on consecutive new-lines using split('\n\n') to give us a list of groups. That takes are of the "empty lines" problem you mentioned.
  2. Then, for each group we're splitting by '\n' to give us a list of sublists.
  3. For each row in each sublist, we're:
    1. Splitting by spaces using map(str.stplit, sublist) to give us a list of str
    2. Which we then convert to a list of floats via map(float, row).

Now, onto selecting data based on certain conditions...

Again, we can use list comprehensions. To select only the groups that have 0 as the first element of the first row:

>>> result = [group for group in data if group[0][0] == 0]

This will result in:

[[[0.0, 0.2523, 462.06, 96.47176, 48.58004, 84.13097],
  [0.4563, 13250000000000.0, 13250000000000.0, 77352440000.0],
  [0.4503, 12830000000000.0, 12830000000000.0, 76841670000.0],
  [0.4443, 11820000000000.0, 11820000000000.0, 75717570000.0],
  [0.4383, 10020000000000.0, 10020000000000.0, 73523580000.0],
  [0.4323, 8971000000000.0, 8971000000000.0, 71962540000.0]],
 [[0.0, 0.9493, 776.06, 98.65339, 41.54604, 94.64194],
  [1.0003, 146700000000000.0, 167400000000000.0, 127987300000.0],
  [0.9973, 146700000000000.0, 167400000000000.0, 128050100000.0],
  [0.9943, 147600000000000.0, 167400000000000.0, 128112200000.0]]]

All done with some very powerful Python built-ins and without importing any modules!

Upvotes: 1

RageCage
RageCage

Reputation: 740

I would try turning each list of lists into actual lists of lists in python. That would make them much easier to work deal with and you can then handle whatever cases far easier by iterating through the lists rather than the file.

lists=[] #this would be your lists of lists of lists (redundant enough for you?)
f=open("whateverfilename.dat")
j=[]
for line in f:
    if line=="\n": #if the line is blank
        lists.append(j) #add the list of lists to your list of lists of lists
        j=[] #clear j for next batch of data
    else:
        a=line.split() #split each piece of data into a list
        j.append(a) #add it to the list of lists you are currently on

This will allow you to iterate through the data as regular lists, which is in my opinion far easier than iterating through the file.

Upvotes: 0

user3820547
user3820547

Reputation: 359

You will want to read the file, and handle different cases according to where you are.

Here's some annotated code for your pleasure:

function read_data(f):
    first, rest = None, [] # Reset data
    for line in f: # Run over lines in the file
      if not line.strip(): # In case of empty line (or only whitespace)
        yield first, rest # Yield the currently held values
        first, rest = None, [] # Reset data
        continue # Skip this line
      if first is None: # If we're at the beginning of a new set
        first = [float(x) for x in line.split()] # Read it into "first"
        continue # And go on
      # Otherwise, we're inside a list, so read that into rest
      rest.append([float(x) for x in line.split()])
    # The file is done, but since there was no empty line,
    # we didn't yield the last entry, so we yield it now
    yield first, rest

Upvotes: 0

Related Questions