Reputation: 2390
for instance the following a matrix, e.g.
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101',201',302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
I'd like to create a new matrix based on a list of columns indexes or names (filter in), so
filter_col_indx = {0,2}
filter_col_name = {'month','val2'}
would produce the same output:
matrix2 = [
['month,'val2'],
['jan','200'],
['feb','201'],
['march','202'],
['april','203'],
['march','204']
]
For large matrices what would be the most efficient way to do this? The list_of_columns can vary.
Thanks
Upvotes: 1
Views: 1884
Reputation: 880419
This can be done using operator.itemgetter
:
import operator
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
filter_col_indx = [0,2]
getter = operator.itemgetter(*filter_col_indx)
matrix2 = [list(getter(row)) for row in matrix]
print(matrix2)
yields
[['month', 'val2'],
['jan', '200'],
['feb', '201'],
['march', '202'],
['april', '203'],
['march', '204']]
operator.itemgetter(*filter_col_indx)
returns a function which takes a sequence as its argument and returns the 0th and 2th items from the sequence. Thus, you can apply this function to each row to select the desired values from matrix
.
If you install pandas, then you could make matrix
a DataFrame and select the desired columns like this:
import pandas as pd
matrix = [
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
]
df = pd.DataFrame(matrix[1:], columns=matrix[0])
print(df[['month', 'val2']])
yields
month val2
0 jan 200
1 feb 201
2 march 202
3 april 203
4 march 204
You might enjoy using pandas since it make a lot of data-munging operations very easy.
Upvotes: 3
Reputation: 1021
This is a numpy version for this:
import numpy as np
matrix = np.array([
['month','val1','val2','valn'],
['jan','100','200','300'],
['feb','101','201','302'],
['march','102','202','303'],
['april','103','203','303'],
['march','104','204','304']
])
search = ['month', 'val2']
indexes = matrix[0,:].searchsorted(search) #search only the first row
# or indexes = [0, 2]
print matrix[:,indexes]
>>> [['month' 'val2']
['jan' '200']
['feb' '201']
['march' '202']
['april' '203']
['march' '204']]
Upvotes: 1
Reputation: 23250
If you're always interested in whole columns, I think it would be appropriate to store the data using a dictionary containing the columns as lists:
data = {'month': ['jan', 'feb', 'march', 'april', 'march'],
'val1': [100, 101, 102, 103, 104],
'val2': [200, 201, 202, 203, 204],
...
}
To retrieve columns (which I have now written horizontally...), you do:
{key: data[key] for key in ['month', 'val2']}
Upvotes: 1