DandyApe
DandyApe

Reputation: 115

How to merge two pandas series if row values are the same

I am trying to merge two columns from a dataframe slice into one column called result.

def long_modifier(data):
    result= data['duration']/data['waiting']
    return result
def short_modifier(data):
    result = data['waiting']/data['duration']
    return result

max_testing_data['interval ratio']=max_testing_data.loc[max_testing_data['kind']=='long'].apply(long_modifier, axis=1)
max_testing_data['interval ratio_1']=max_testing_data.loc[max_testing_data['kind']=='short'].apply(short_modifier, axis=1)
frames = [max_testing_data['interval ratio'], max_testing_data['interval ratio_1']]
result = pd.concat(frames)
result

This is the dataframe I am basing it on.

enter image description here

expected output looks something like this: (values not exact)

       result
5      20 
15     20.2
17     .057
24     .055

Upvotes: 0

Views: 157

Answers (1)

constantstranger
constantstranger

Reputation: 9379

To recap, your question is about how to do the following:

  • Given a dataframe with columns duration, waiting and kind, merge two of these (duration and waiting) into a new column called result whose value is contingent upon the value in the kind column

This is one way to do it:

import pandas as pd
max_testing_data = pd.DataFrame([
    {'duration': 2883, 'waiting': 55, 'kind': 'short'},
    {'duration': 2167, 'waiting': 52, 'kind': 'short'},
    {'duration': 4800, 'waiting': 84, 'kind': 'long'},
    {'duration': 4533, 'waiting': 74, 'kind': 'long'}
])

def long_modifier(data):
    result= data['duration']/data['waiting']
    return result
def short_modifier(data):
    result = data['waiting']/data['duration']
    return result
max_testing_data['result']=max_testing_data.apply(lambda data: long_modifier(data) if data['kind']=='long' else short_modifier(data), axis=1)
result = max_testing_data[['kind', 'result']].sort_values(by='kind', ignore_index=True)
print(result)

Output:

    kind     result
0   long  57.142857
1   long  61.256757
2  short   0.019077
3  short   0.023996

If I have gotten any of the nuances of your question wrong, hopefully the ideas in the code above are helpful in any case.

UPDATE: To do this without using if statements, you can replace the assignment to the 'result' column with the following:

max_testing_data['result']=max_testing_data.apply(lambda data: (short_modifier(data), long_modifier(data))[data['kind']=='long'], axis=1)

Upvotes: 1

Related Questions