Vikrant Srivastava
Vikrant Srivastava

Reputation: 129

Extract information from an Excel (by updating arrays) with Excel / Python

I have an Excel file with thousands of columns on the following format:

Member No. X Y Z
1000 25 60 -30
-69 38 68
45 2 43
1001 24 55 79
4 -7 89
78 51 -2
1002 45 -55 149
94 77 -985
-2 559 56

I need a way such that I shall get a new table with the absolute maximum value from each column. In this example, something like:

Member No. X Y Z
1000 69 60 68
1001 78 55 89
1002 94 559 985

I have tried it in Excel (with using VLOOKUP for finding the "Member Number" in the first row and then using HLOOKUP for finding the values from the rows thereafter), but the problem is that the HLOOKUP command is not automatically updated with the new array (the array in which Member number 1001 is) (so my solution works for member 1000, but not for 1001 and 1002), and hence it always searches for the new value ONLY in the 1st Row (i.e. the row with the member number 1000).

I also tried reading the file with Python, but I am not well-versed enough to make much of a headway - once the dataset has been read, how do I tell excel to read the next 3 rows and get the (absolute) maximum in each column?

Can someone please help? Solution required in Python 3 or Excel (ideally, Excel 2014).

Upvotes: 1

Views: 135

Answers (1)

sophocles
sophocles

Reputation: 13821

The below solution will get you your desired output using Python.

I first ffill to fill in the blanks in your Member No column (axis=0 means row-wise). Then convert your dataframe values to +ve using abs. Lastly, using pandas.DataFrame.agg, I get the max value for all the columns in your dataframe.

Assuming your dataframe is called data:

import pandas as pd

data['Member No.'] = data['Member No.'].ffill(axis=0).astype(int)
df = abs(df)
res = (data.groupby('Member No.').apply(lambda x: x.max())).drop('Member No.',axis=1).reset_index()

Which will print you:

   Member No.   X    Y    Z    A    B    C
0        1000  69   60   68   60   74   69
1        1001  78   55   89   78   92   87
2        1002  94  559  985  985  971  976

Note that I added extra columns in your sample data to make sure that all the columns will return their max() value.

Upvotes: 2

Related Questions