Reputation: 4807
I have a dataframe which looks as following with multiple blank rows:
date hour Temp
6/1/2017 0:00 64
6/7/2017 22:00 63
6/7/2017 23:00 62
6/2/2017 0:00 62
6/2/2017 1:00 60
6/8/2017 23:00 65
6/6/2017 0:00 64
6/6/2017 1:00 64
6/12/2017 22:00 78
6/12/2017 23:00 76
I want to create the following:
date hour Temp newDate
6/1/2017 0:00 64 6/1/2017
6/7/2017 22:00 63 6/1/2017
6/7/2017 23:00 62 6/1/2017
6/2/2017 0:00 62 6/2/2017
6/2/2017 1:00 60 6/2/2017
6/8/2017 23:00 65 6/2/2017
6/6/2017 0:00 64 6/6/2017
6/6/2017 1:00 64 6/6/2017
6/12/2017 22:00 78 6/6/2017
6/12/2017 23:00 76 6/6/2017
Where a new column has been created with first date from date
column right after the blank rows.
I am trying to create for loop but is there a better way?
Upvotes: 1
Views: 43
Reputation: 164623
There will, no doubt, be a smart Pandas solution. But here's a solution using itertools.groupby
. I assume that your blank rows consist of NaN
items, and leverage the fact that np.nan == np.nan
returns False
.
from itertools import groupby, chain
# group by items being NaN
grouper = groupby(df['date'], key=lambda x: x==x)
# extract first item, multiply and chain
chainer = chain.from_iterable([next(j)]*(len(list(j))+1) for _, j in grouper)
# assign to new series
df['newDate'] = list(chainer)
print(df)
date hour Temp newDate
0 NaN NaN NaN NaN
1 6/1/2017 0:00 64.0 6/1/2017
2 6/7/2017 22:00 63.0 6/1/2017
3 6/7/2017 23:00 62.0 6/1/2017
4 NaN NaN NaN NaN
5 6/2/2017 0:00 62.0 6/2/2017
6 6/2/2017 1:00 60.0 6/2/2017
7 6/8/2017 23:00 65.0 6/2/2017
8 NaN NaN NaN NaN
9 6/6/2017 0:00 64.0 6/6/2017
10 6/6/2017 1:00 64.0 6/6/2017
11 6/12/2017 22:00 78.0 6/6/2017
12 6/12/2017 23:00 76.0 6/6/2017
Upvotes: 1