Reputation: 43
Problem description
Introductory remark: For the code have a look below
Let's say we have a pandas dataframe consisting of 3 columns and 2 rows. I'd like to add a 4th column called 'Max_LF' that will consist of an array. The value of the cell is retrieved by having a look at the column 'Max_WD'. For the first row that would be 0.35 which will than be compared to the values in the column 'WD' where 0.35 can be found at the third position. Therefore, the third value of the column 'LF' should be written into the column 'Max_LF'. If the value of 'Max_WD' occures multiple times in 'WD', then all corresponding items of 'LF' should be written into 'Max_LF'.
Failed attempt
So far I had various attemps on first retrieving the index of the item in 'Max_WD' in 'WD'. After potentially retrieving the index the idea was to then get the items of 'LF' via their index:
df4['temp_indices'] = [i for i, x in enumerate(df4['WD']) if x == df4['Max_WD']]
However, a ValueError occured: raise ValueError('Lengths must match to compare') ValueError: Lengths must match to compare
This is what the example dateframe looks like
df = pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] , 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41]})
The expected outcome should look like
df=pd.DataFrame(data={'LF': [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] , 'WD': [[0.28, 0.34, 0.35, 0.18], [0.42, 0.45, 0.45, 0.18], [0.31, 0.21, 0.41, 0.41]], 'Max_WD': [0.35, 0.45, 0.41], 'Max_LF': [[3] ,[2,3], [3,4]]})
Upvotes: 2
Views: 1925
Reputation: 43
Thanks guys! With your help I was able to solve my problem. Like Prince Francis suggested I first did
df['temp'] = df.apply(lambda x : [i for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)
to get the indicees of the 'WD'-values in 'LF'. In a second stept I then could add the actual column 'Max_LF' by doing
df['LF_Max'] = df.apply(lambda x: [x['LF'][e] for e in (x['temp'])],axis=1)
Thanks a lot guys!
Upvotes: 1
Reputation: 3097
You could get it by simply using lambda
as follows
df['Max_LF'] = df.apply(lambda x : [i + 1 for i, e in enumerate(x['WD']) if e == x['Max_WD']], axis=1)
output is
LF Max_WD WD Max_LF
0 [1, 2, 3] 0.35 [0.28, 0.34, 0.35, 0.18] [3]
1 [1, 2, 3] 0.45 [0.42, 0.45, 0.45, 0.18] [2, 3]
2 [1, 2, 3] 0.41 [0.31, 0.21, 0.41, 0.41] [3, 4]
Upvotes: 3
Reputation: 2109
You can achieve it by applying a function over axis 1.
For this, I recommend you to first convert the WD list into a pd.Series (or a numpy.ndarray) and then compare all the values at once.
Assuming that you want a list of all the values higher than the threshold, you could use this:
>>> def get_max_wd(x):
... wd = pd.Series(x.WD)
... return list(wd[wd >= x.Max_WD])
...
>>> df.apply(get_max_wd, axis=1)
0 [0.35]
1 [0.45, 0.45]
2 [0.41, 0.41]
dtype: object
The result of the apply
can then be assigned as a new column into the dataframe:
df['Max_LF'] = df.apply(get_max_wd, axis=1)
If what you are after is only the maximum value (see my comment above), you can use the max()
method within the function.
Upvotes: 0