Dominik Vogt
Dominik Vogt

Reputation: 43

Adding a column to a pandas dataframe based on other columns

Problem description

Introductory remark: For the code have a look below

Let's say we have a pandas dataframe consisting of 3 columns and 2 rows. I'd like to add a 4th column called 'Max_LF' that will consist of an array. The value of the cell is retrieved by having a look at the column 'Max_WD'. For the first row that would be 0.35 which will than be compared to the values in the column 'WD' where 0.35 can be found at the third position. Therefore, the third value of the column 'LF' should be written into the column 'Max_LF'. If the value of 'Max_WD' occures multiple times in 'WD', then all corresponding items of 'LF' should be written into 'Max_LF'.

Failed attempt

So far I had various attemps on first retrieving the index of the item in 'Max_WD' in 'WD'. After potentially retrieving the index the idea was to then get the items of 'LF' via their index: df4['temp_indices'] = [i for i, x in enumerate(df4['WD']) if x == df4['Max_WD']]

However, a ValueError occured: raise ValueError('Lengths must match to compare') ValueError: Lengths must match to compare

This is what the example dateframe looks like

df = pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] , 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41]})

The expected outcome should look like

df=pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] , 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41], 'Max_LF': [[3] ,[2,3], [3,4]]})

Upvotes: 2

Views: 1925

Answers (3)

Dominik Vogt
Dominik Vogt

Reputation: 43

Thanks guys! With your help I was able to solve my problem. Like Prince Francis suggested I first did

df['temp'] = df.apply(lambda x : [i for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)

to get the indicees of the 'WD'-values in 'LF'. In a second stept I then could add the actual column 'Max_LF' by doing

df['LF_Max'] = df.apply(lambda x: [x['LF'][e] for e in (x['temp'])],axis=1)

Thanks a lot guys!

Upvotes: 1

Prince Francis
Prince Francis

Reputation: 3097

You could get it by simply using lambda as follows

df['Max_LF'] = df.apply(lambda x : [i + 1 for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)

output is

    LF  Max_WD  WD  Max_LF
0   [1, 2, 3]   0.35    [0.28, 0.34, 0.35, 0.18]    [3]
1   [1, 2, 3]   0.45    [0.42, 0.45, 0.45, 0.18]    [2, 3]
2   [1, 2, 3]   0.41    [0.31, 0.21, 0.41, 0.41]    [3, 4]

Upvotes: 3

Carles Sala
Carles Sala

Reputation: 2109

You can achieve it by applying a function over axis 1.

For this, I recommend you to first convert the WD list into a pd.Series (or a numpy.ndarray) and then compare all the values at once.

Assuming that you want a list of all the values higher than the threshold, you could use this:

>>> def get_max_wd(x): 
...     wd = pd.Series(x.WD) 
...     return list(wd[wd >= x.Max_WD])
... 
>>> df.apply(get_max_wd, axis=1)
0          [0.35]
1    [0.45, 0.45]
2    [0.41, 0.41]
dtype: object

The result of the apply can then be assigned as a new column into the dataframe:

df['Max_LF'] = df.apply(get_max_wd, axis=1)

If what you are after is only the maximum value (see my comment above), you can use the max() method within the function.

Upvotes: 0

Related Questions