Reputation: 37
I'm trying to compare two values in the same row to eachother to see which is greater. The two rows are dates like this:
PheneVisit PheneDate ER Date
phchp010v3 9/22/06
phchp010v3 9/22/06 5/1/20
phchp101v1 9/14/09
phchp101v1 9/14/09
And I want to put the highest value in a new column called "dates" using the code below, but it does not catch the blank/NaN values for the ER Date column and simply produces more NaN columns if there is no ER Date.
Here is the code I have currently:
for index, row in data.iterrows():
if (pd.to_datetime(row['PheneDate']) > pd.to_datetime(row['ER Date'])) or (row['ER Date'] == 0):
print('1')
data["date order"] = data['PheneDate']
elif pd.to_datetime(row['ER Date']) > pd.to_datetime(row['PheneDate']) or [row['PheneDate'] == 0]:
print(2)
data["date order"] = data['ER Date']
which produces the following output:
date order
5/1/20
The desired output would be
9/22/06
5/1/20
9/14/09
9/14/09
** I simply need to check the values for NaN in each cell and if it is empty then the non empty date should be taken. I also tried == 'nan' and == 0 **
Upvotes: 0
Views: 231
Reputation: 21
Try numpy where:
df['new_col'] = np.where(df['ER Date'] > df['PheneDate'], df['ER Date'], df['PheneDate'])
Upvotes: 0
Reputation: 260600
You can use pandas.to_datetime to convert to the datetime type and use normal comparison:
df['PheneDate'] = pd.to_datetime(df['PheneDate'])
df['ER'] = pd.to_datetime(df['ER'])
df['Date'] = df[['PheneDate', 'ER']].max(axis=1)
output:
PheneVisit PheneDate ER Date
0 phchp010v3 2006-09-22 NaT 2006-09-22
1 phchp010v3 2006-09-22 2020-05-01 2020-05-01
2 phchp101v1 2009-09-14 NaT 2009-09-14
3 phchp101v1 2009-09-14 NaT 2009-09-14
If you need to convert back to string with a specific regional formatting, use the following:
>>> df['Date'].dt.strftime('%m/%d/%y')
0 09/22/06
1 05/01/20
2 09/14/09
3 09/14/09
Finally, if you really want to avoid leading zeros in the month/day formatting:
>>> df['Date'].dt.strftime('%-m/%-d/%y')
0 9/22/06
1 5/1/20
2 9/14/09
3 9/14/09
Upvotes: 1