BelottiGhilardi
BelottiGhilardi

Reputation: 75

Pandas dropna messes up datetime index

I am writing a function that processes a dataframe. Rows in this dataframe are indexed by a datetime index and there is a row per hour in the dataframe. Basically, after doing some processing, this is what I have:

                     inquinante  temperatura  precipitazioni  ...  umidita  day_of_year  day_of_week
Data                                                          ...                                   
2000-07-04 00:00:00        55.0         23.9             0.0  ...     86.8          186            1
2000-07-04 01:00:00         NaN         23.4             0.0  ...     86.2          186            1
2000-07-04 02:00:00         NaN         22.7             0.0  ...     92.5          186            1
2000-07-04 03:00:00         NaN         22.1             0.0  ...     97.5          186            1
2000-07-04 04:00:00         NaN         22.2             0.0  ...     95.9          186            1

Now I want to filter out the rows for which the value for the column 'inquinante' is NaN, so I wrote the following line of code:

df = df.dropna(subset=["inquinante"])

but what I get after it executes is the following:

           inquinante  temperatura  precipitazioni  ...    umidita  day_of_year  day_of_week
Data                                                 ...                                     
2014-01-31        25.0     4.700000        1.000000  ...  95.700000           31            4
2014-02-01        31.0     5.800000        0.000000  ...  94.800000           32            5
2014-02-02        20.0     6.100000        1.800000  ...  97.300000           33            6
2014-02-03        17.0     6.700000        0.600000  ...  96.300000           34            0
2014-02-04        18.0     6.600000        0.800000  ...  97.200000           35            1

Why now my dates are gouped by days and not hours like they were before? I also tried to change the line of code to:

df = df[df.inquinante >= 0]
#or
df = df[df.inquinante.notna()]

But none of these seemed to fix the problem. Is there any way I can fix this and prevent pandas from grouping my dates?

Thanks in advance

Upvotes: 0

Views: 738

Answers (3)

Raghul Raj
Raghul Raj

Reputation: 1458

If only the format of your date changed and not the values, You can always convert it back as follows:

df['Date'] = df['Date'].dt.strftime('%Y-%m-%d %H:%M:%S')

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

This is the automatic representation of a datetime index when all the index labels have midnight or time 00:00:00 as its time stamp.

df = pd.DataFrame({'value':np.arange(20)}, index=pd.date_range('2020-02-01', periods=20, freq='12H'))
df 

Output:

                     value
2020-02-01 00:00:00      0
2020-02-01 12:00:00      1
2020-02-02 00:00:00      2
2020-02-02 12:00:00      3
2020-02-03 00:00:00      4
2020-02-03 12:00:00      5
2020-02-04 00:00:00      6
2020-02-04 12:00:00      7
2020-02-05 00:00:00      8
2020-02-05 12:00:00      9
2020-02-06 00:00:00     10
2020-02-06 12:00:00     11
2020-02-07 00:00:00     12
2020-02-07 12:00:00     13
2020-02-08 00:00:00     14
2020-02-08 12:00:00     15
2020-02-09 00:00:00     16
2020-02-09 12:00:00     17
2020-02-10 00:00:00     18
2020-02-10 12:00:00     19

Now, let's drop all time where hour == 12 leaving only the midnight timestamp:

df[df.index.hour != 12]

Output:

            value
2020-02-01      0
2020-02-02      2
2020-02-03      4
2020-02-04      6
2020-02-05      8
2020-02-06     10
2020-02-07     12
2020-02-08     14
2020-02-09     16
2020-02-10     18

That is still a datetimeindex and each label does have a timestamp.

df[df.index.hour != 12].index.strftime('%Y-%m-%d %H:%M:%S')

Output:

Index(['2020-02-01 00:00:00', '2020-02-02 00:00:00', '2020-02-03 00:00:00',
       '2020-02-04 00:00:00', '2020-02-05 00:00:00', '2020-02-06 00:00:00',
       '2020-02-07 00:00:00', '2020-02-08 00:00:00', '2020-02-09 00:00:00',
       '2020-02-10 00:00:00'],
      dtype='object')

Upvotes: 2

Sajan
Sajan

Reputation: 1267

You could try something like this -

temp = df.reset_index()
temp = temp[temp['inquinante'].notna()].set_index('Data')

Upvotes: 1

Related Questions