Guitarman045
Guitarman045

Reputation: 37

How to compare and parse out NaN values in rows of a DataFrame in Pandas?

I'm trying to compare two values in the same row to eachother to see which is greater. The two rows are dates like this:

PheneVisit  PheneDate   ER Date
phchp010v3  9/22/06 
phchp010v3  9/22/06     5/1/20
phchp101v1  9/14/09 
phchp101v1  9/14/09 

And I want to put the highest value in a new column called "dates" using the code below, but it does not catch the blank/NaN values for the ER Date column and simply produces more NaN columns if there is no ER Date.

Here is the code I have currently:

for index, row in data.iterrows():
    if (pd.to_datetime(row['PheneDate']) > pd.to_datetime(row['ER Date'])) or (row['ER Date'] == 0):
        print('1')
        data["date order"] = data['PheneDate']
    elif pd.to_datetime(row['ER Date']) > pd.to_datetime(row['PheneDate']) or [row['PheneDate'] == 0]:
        print(2)
        data["date order"] = data['ER Date']

which produces the following output:

date order

5/1/20

The desired output would be

9/22/06
5/1/20
9/14/09
9/14/09

** I simply need to check the values for NaN in each cell and if it is empty then the non empty date should be taken. I also tried == 'nan' and == 0 **

Upvotes: 0

Views: 231

Answers (2)

user12014098
user12014098

Reputation: 21

Try numpy where:

df['new_col'] = np.where(df['ER Date'] > df['PheneDate'], df['ER Date'], df['PheneDate'])

Upvotes: 0

mozway
mozway

Reputation: 260600

You can use pandas.to_datetime to convert to the datetime type and use normal comparison:

df['PheneDate'] = pd.to_datetime(df['PheneDate'])
df['ER'] = pd.to_datetime(df['ER'])
df['Date'] = df[['PheneDate', 'ER']].max(axis=1)

output:

   PheneVisit  PheneDate         ER       Date
0  phchp010v3 2006-09-22        NaT 2006-09-22
1  phchp010v3 2006-09-22 2020-05-01 2020-05-01
2  phchp101v1 2009-09-14        NaT 2009-09-14
3  phchp101v1 2009-09-14        NaT 2009-09-14

If you need to convert back to string with a specific regional formatting, use the following:

>>> df['Date'].dt.strftime('%m/%d/%y')

0    09/22/06
1    05/01/20
2    09/14/09
3    09/14/09

Finally, if you really want to avoid leading zeros in the month/day formatting:

>>> df['Date'].dt.strftime('%-m/%-d/%y')

0    9/22/06
1     5/1/20
2    9/14/09
3    9/14/09

Upvotes: 1

Related Questions