spidermarn
spidermarn

Reputation: 939

Python - How to clean time series data

I have a df which looks like this:

df = pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5,'NaN','NaN','NaN','NaN'],
                  'Dec':[2,8,4,7,9,1,5]})

enter image description here

I'm trying to create a new column called 'First_Contract':

  1. 'First_Contract' needs to take the third-last value of 'Sep' column, before 'Sep'column reaches NaN.
  2. The subsequent values need to be filled with 'Dec' column values.

Desired output:

df2= pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5,'NaN','NaN','NaN','NaN'],
                  'Dec':[2,8,4,7,9,1,5],
                  'First_Contract':[1,8,4,7,9,1,5]})

enter image description here

How do I go about to achieve this?

Upvotes: 0

Views: 688

Answers (3)

BENY
BENY

Reputation: 323306

Let us do it step by step

df.Sep.replace({'NaN': np.nan}, inplace=True)
df['FC'] = df['Dec']
ids = df.Sep.last_valid_index()-2
df.loc[ids,'FC'] = df.Sep[ids]
df
Out[126]: 
         Date   Sep  Dec  First_Contract   FC
0  2019-09-23   1.0    2               1  1.0
1  2019-09-24  10.0    8               8  8.0
2  2019-09-25   5.0    4               4  4.0
3  2019-09-26   NaN    7               7  7.0
4  2019-09-27   NaN    9               9  9.0
5  2019-09-28   NaN    1               1  1.0
6  2019-09-29   NaN    5               5  5.0

Upvotes: 1

Chris
Chris

Reputation: 16147

You can use numpy to fill fill in Sep where the index is 3 behind the first null index, and fill the rest with Dec

import pandas as pd
import numpy as np
df = pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5,np.nan,np.nan,np.nan,np.nan],
                  'Dec':[2,8,4,7,9,1,5]})


df['First_Contract'] = np.where(df.index==df.Sep.isnull().idxmax()-3, df.Sep, df.Dec)

Upvotes: 1

PieCot
PieCot

Reputation: 3639

You can use pd.concat and last_valid_index to create your column:

df['First_contract'] = pd.concat((
    df.Sep.iloc[:df.Sep.last_valid_index() - 1], df.Dec.iloc[df.Sep.last_valid_index() - 1:]
)).astype(int)

Complete code (I replaced strings 'NaN' with np.nan in Sep column; it is not needed if they are already NaN):

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5, 'NaN','NaN','NaN','NaN'],
                  'Dec':[2,8,4,7,9,1,5]})
df.Sep.replace({'NaN': np.nan}, inplace=True)
df['First_contract'] = pd.concat((
    df.Sep.iloc[:df.Sep.last_valid_index() - 1], df.Dec.iloc[df.Sep.last_valid_index() - 1:]
)).astype(int)

Output:

         Date   Sep  Dec  First_contract
0  2019-09-23   1.0    2               1
1  2019-09-24  10.0    8               8
2  2019-09-25   5.0    4               4
3  2019-09-26   NaN    7               7
4  2019-09-27   NaN    9               9
5  2019-09-28   NaN    1               1
6  2019-09-29   NaN    5               5

Upvotes: 1

Related Questions