Reputation: 67
I want to combine two columns of a dataframe depending on their values. The values per row are going to be in one of three states:
A) Either they are both the same value,
B) Only one cell has a value
C) They are different values
For example:
I want to make this into one column depending on their similarity. If they are equal, then it should just be that value. If one is blank then it should be non-NaN value. And if they are not equal to each other then I want to raise a flag (something like "At row N, cell 1 does not match cell 2", not important what it is).
So using the above example, the column would look like:
The "!" is just a placeholder. How could I go about doing this.
This is an example of a similar problem, but I also need the added step of the function considering what the two cell values are. Combine two columns of text in dataframe in pandas/python
Apologies for using Excel screenshots, not sure how to properly create a table here.
Upvotes: 4
Views: 366
Reputation: 620
You can use the combine method in Pandas
import pandas as pd
import numpy as np
df = pd.DataFrame({"departure":[327,427,429,np.nan], "arrival":[np.nan,427,431,457]})
selection_rule = lambda s1, s2: s1 if s1 == s2 else (s1 if np.isnan(s2) else (s2 if np.isnan(s1) else "!"))
df['time'] = df['departure'].combine(df['arrival'], selection_rule )
>>> df
departure arrival time
0 327.0 NaN 327
1 427.0 427.0 427
2 429.0 431.0 !
3 NaN 457.0 457
Upvotes: 2
Reputation: 670
Unless your dataframe is very large and applying a function to it will take a long time I'd say the best way would be to just write a function for this.
def get_time(row):
departure = row['Departure']
arrival = row['Arrival']
if pd.isnull(departure) and pd.notnull(arrival):
return arrival
if pd.isnull(arrival) and pd.notnull(departure):
return departure
if departure == arrival:
return departure
return "Your Error"
df['Time'] = df.apply(get_time, axis=1)
Upvotes: 0