Fred
Fred

Reputation: 67

Combining two columns in a pandas dataframe depending on their value

I want to combine two columns of a dataframe depending on their values. The values per row are going to be in one of three states:

A) Either they are both the same value,

B) Only one cell has a value

C) They are different values

For example:

enter image description here

I want to make this into one column depending on their similarity. If they are equal, then it should just be that value. If one is blank then it should be non-NaN value. And if they are not equal to each other then I want to raise a flag (something like "At row N, cell 1 does not match cell 2", not important what it is).

So using the above example, the column would look like: enter image description here

The "!" is just a placeholder. How could I go about doing this.

This is an example of a similar problem, but I also need the added step of the function considering what the two cell values are. Combine two columns of text in dataframe in pandas/python

Apologies for using Excel screenshots, not sure how to properly create a table here.

Upvotes: 4

Views: 366

Answers (2)

Oleg
Oleg

Reputation: 620

You can use the combine method in Pandas

import pandas as pd
import numpy as np

df = pd.DataFrame({"departure":[327,427,429,np.nan], "arrival":[np.nan,427,431,457]})
selection_rule = lambda s1, s2: s1 if s1 == s2 else (s1 if np.isnan(s2) else (s2 if np.isnan(s1) else "!"))

df['time'] = df['departure'].combine(df['arrival'], selection_rule )


>>> df
   departure  arrival   time
0      327.0      NaN    327
1      427.0    427.0    427
2      429.0    431.0      !
3        NaN    457.0    457

Upvotes: 2

eva-vw
eva-vw

Reputation: 670

Unless your dataframe is very large and applying a function to it will take a long time I'd say the best way would be to just write a function for this.


def get_time(row):
    departure = row['Departure']
    arrival = row['Arrival']

    if pd.isnull(departure) and pd.notnull(arrival):
        return arrival
    if pd.isnull(arrival) and pd.notnull(departure):
        return departure
    if departure == arrival:
        return departure
    return "Your Error"

df['Time'] = df.apply(get_time, axis=1)

Upvotes: 0

Related Questions