Reputation: 11907
I have a dataframe like this
A B
2018-02-01 00:00:00 5.592860 2.789900
2018-02-01 00:15:00 5.288981 2.054017
2018-02-01 00:30:00 5.319665 2.232686
2018-02-01 00:45:00 5.198657 2.236154
2018-02-01 01:00:00 5.018134 2.064312
The Column A will not have any missing values, but column B will be having.
I have another list like this of length 12 named forecasts
[0.09545173 0.09946214 0.10596157 0.12075519 0.14446978 0.16848372
0.20479251 0.23742175 0.26723814 0.29389328 0.30628437 0.3140854 ]
I want to loop through each row in the dataframe and check whether next 12 rows have any nan
. If nan
is present, replace that value from the corresponding index from the list.
If the 2nd row from the current index is nan
then replace that nan
by forecasts[2]
To make things more clear,
I will have a dataframe with data like I said the question. There may be misses in the column B but not A. I will have a list of 12 location for each timestamp. The list being named forecast will have the forecasted value of present timestamp in forecasts[0] and 11th timestamp from now in forecast[11]. I want to loop though each timestamp in the dataset, check whether any nan is present in the next 12 locations of the dataframe in the column B.
If there are nan, then replace it with the forecast.
How can I do this easily with pandas.?
Upvotes: 1
Views: 2580
Reputation: 658
Well this is the basic issue to handle or work with missing or unknown values. You can use fillna() function to fill missing values with default value that you want.
e.g: If df1 is your dataframe containing missing values in multiple columns.
for column in df1:
print("column ",column)
df1[column] = df1[column].fillna(0.12345)
To check fillna syntax and example check https://kite.com/python/docs/pandas.core.frame.DataFrame.fillna
You can also use pandas isna() function to check where values are missing.
Upvotes: -1
Reputation: 862396
You can use:
import pandas as pd
import numpy as np
temp=u"""A;B
2018-02-01 00:00:00;5.592860;2.789900
2018-02-01 00:15:00;5.288981;NaN
2018-02-01 00:30:00;5.319665;2.232686
2018-02-01 00:45:00;5.198657;2.236154
2018-02-01 01:00:00;5.018134;2.064312
2018-02-01 01:15:00;5.018134;NaN
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", parse_dates=True)
print (df)
A B
2018-02-01 00:00:00 5.592860 2.789900
2018-02-01 00:15:00 5.288981 NaN
2018-02-01 00:30:00 5.319665 2.232686
2018-02-01 00:45:00 5.198657 2.236154
2018-02-01 01:00:00 5.018134 2.064312
2018-02-01 01:15:00 5.018134 NaN
L = [0.09545173, 0.09946214, 0.10596157]
r = int(len(df) / len(L))
print (r)
2
s = pd.Series(np.tile(np.array(L), r), index=df.index)
print (s)
2018-02-01 00:00:00 0.095452
2018-02-01 00:15:00 0.099462
2018-02-01 00:30:00 0.105962
2018-02-01 00:45:00 0.095452
2018-02-01 01:00:00 0.099462
2018-02-01 01:15:00 0.105962
dtype: float64
df['B'] = df['B'].fillna(s)
print (df)
A B
2018-02-01 00:00:00 5.592860 2.789900
2018-02-01 00:15:00 5.288981 0.099462
2018-02-01 00:30:00 5.319665 2.232686
2018-02-01 00:45:00 5.198657 2.236154
2018-02-01 01:00:00 5.018134 2.064312
2018-02-01 01:15:00 5.018134 0.105962
Upvotes: 2
Reputation: 71560
OR a one-liner list comprehension:
df['B'] = [l[i] if type(v)==type(np.nan) else v for i,v in enumerate(df['B'].tolist())]
Upvotes: 1