Alg_D
Alg_D

Reputation: 2390

python - create new sub-matrix by filtering columns from matrix/bidimensional list

for instance the following a matrix, e.g.

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101',201',302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

I'd like to create a new matrix based on a list of columns indexes or names (filter in), so

filter_col_indx = {0,2}
filter_col_name = {'month','val2'}

would produce the same output:

matrix2 = [
    ['month,'val2'],
    ['jan','200'],
    ['feb','201'],
    ['march','202'],
    ['april','203'],
    ['march','204']
]

For large matrices what would be the most efficient way to do this? The list_of_columns can vary.

Thanks

Upvotes: 1

Views: 1884

Answers (3)

unutbu
unutbu

Reputation: 880419

This can be done using operator.itemgetter:

import operator
matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]

filter_col_indx = [0,2]
getter = operator.itemgetter(*filter_col_indx)
matrix2 = [list(getter(row)) for row in matrix]
print(matrix2)

yields

[['month', 'val2'],
 ['jan', '200'],
 ['feb', '201'],
 ['march', '202'],
 ['april', '203'],
 ['march', '204']]

operator.itemgetter(*filter_col_indx) returns a function which takes a sequence as its argument and returns the 0th and 2th items from the sequence. Thus, you can apply this function to each row to select the desired values from matrix.


If you install pandas, then you could make matrix a DataFrame and select the desired columns like this:

import pandas as pd

matrix = [
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
]
df = pd.DataFrame(matrix[1:], columns=matrix[0])
print(df[['month', 'val2']])

yields

   month val2
0    jan  200
1    feb  201
2  march  202
3  april  203
4  march  204

You might enjoy using pandas since it make a lot of data-munging operations very easy.

Upvotes: 3

igavriil
igavriil

Reputation: 1021

This is a numpy version for this:

import numpy as np

matrix = np.array([
    ['month','val1','val2','valn'],
    ['jan','100','200','300'],
    ['feb','101','201','302'],
    ['march','102','202','303'],
    ['april','103','203','303'],
    ['march','104','204','304']
])

search = ['month', 'val2']

indexes = matrix[0,:].searchsorted(search) #search only the first row
# or indexes = [0, 2]
print matrix[:,indexes] 
>>> [['month' 'val2']
     ['jan' '200']
     ['feb' '201']
     ['march' '202']
     ['april' '203']
     ['march' '204']]

Upvotes: 1

mkrieger1
mkrieger1

Reputation: 23250

If you're always interested in whole columns, I think it would be appropriate to store the data using a dictionary containing the columns as lists:

data = {'month': ['jan', 'feb', 'march', 'april', 'march'],
        'val1': [100, 101, 102, 103, 104],
        'val2': [200, 201, 202, 203, 204],
        ...
       }

To retrieve columns (which I have now written horizontally...), you do:

{key: data[key] for key in ['month', 'val2']}

Upvotes: 1

Related Questions