Reputation: 129
I have an Excel file with thousands of columns on the following format:
Member No. | X | Y | Z |
---|---|---|---|
1000 | 25 | 60 | -30 |
-69 | 38 | 68 | |
45 | 2 | 43 | |
1001 | 24 | 55 | 79 |
4 | -7 | 89 | |
78 | 51 | -2 | |
1002 | 45 | -55 | 149 |
94 | 77 | -985 | |
-2 | 559 | 56 |
I need a way such that I shall get a new table with the absolute maximum value from each column. In this example, something like:
Member No. | X | Y | Z |
---|---|---|---|
1000 | 69 | 60 | 68 |
1001 | 78 | 55 | 89 |
1002 | 94 | 559 | 985 |
I have tried it in Excel (with using VLOOKUP
for finding the "Member Number" in the first row and then using HLOOKUP
for finding the values from the rows thereafter), but the problem is that the HLOOKUP
command is not automatically updated with the new array (the array in which Member number 1001 is) (so my solution works for member 1000, but not for 1001 and 1002), and hence it always searches for the new value ONLY in the 1st Row (i.e. the row with the member number 1000).
I also tried reading the file with Python, but I am not well-versed enough to make much of a headway - once the dataset has been read, how do I tell excel to read the next 3 rows and get the (absolute) maximum in each column?
Can someone please help? Solution required in Python 3 or Excel (ideally, Excel 2014).
Upvotes: 1
Views: 135
Reputation: 13821
The below solution will get you your desired output using Python.
I first ffill
to fill in the blanks in your Member No column (axis=0
means row-wise). Then convert your dataframe values to +ve using abs
. Lastly, using pandas.DataFrame.agg
, I get the max
value for all the columns in your dataframe.
Assuming your dataframe is called data
:
import pandas as pd
data['Member No.'] = data['Member No.'].ffill(axis=0).astype(int)
df = abs(df)
res = (data.groupby('Member No.').apply(lambda x: x.max())).drop('Member No.',axis=1).reset_index()
Which will print you:
Member No. X Y Z A B C
0 1000 69 60 68 60 74 69
1 1001 78 55 89 78 92 87
2 1002 94 559 985 985 971 976
Note that I added extra columns in your sample data to make sure that all the columns will return their max()
value.
Upvotes: 2