Reputation: 55
import numpy as np
import pandas as pd
This is my data:
ts = pd.DataFrame([0,1,2,3,4,5,6,7,8,9,10,11,12])
ts.columns = ["TS"]
start_df = pd.Series([1,3,6])
end_df = pd.Series([2,7,10])
I have created the following function to clean up my loop, and a for loop to iterate over each element in ts and save according to the output of check_if
.
def check_if(start, ts, end):
if start <= ts <= end:
return 1
else:
return 0
ts["Flagg"] = np.nan
for ix, hour in enumerate (ts["TS"]):
for jx, end in enumerate(end_df):
ts["Flagg"][ix] = check_if(start_df[jx], hour, end_df[jx])
The problem is that my resulting ts["Flagg"]
only saves the result of the last iteration, start_df == 6
and end_df == 10
. Is my logic in the loop completely of?
Edit:
Expected output
[0,1,1,1,1,2,2,1,1,1,0,0]
in column ts["Flagg"]
.
Upvotes: 2
Views: 238
Reputation: 564
you can create column (series, list) and then set it as column as jezrael pointed or create column with some initial values and then change them in loop:
ts["Flagg"] = [0 for _ in range(ts.size)]
for ix, hour in enumerate (ts["TS"]):
for jx, end in enumerate(end_df):
ts["Flagg"][ix] = check_if(start_df[jx], hour, end_df[jx])
Upvotes: 0
Reputation: 863531
Use between
with list comprehension for list of boolean mask and then sum
it for count True
values (are processes like 1
), thanks @RafaelC for improvement:
ts['new'] = np.sum([ts['TS'].between(x, y) for x, y in zip(start_df, end_df)], axis=0)
print (ts)
TS new
0 0 0
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 2
7 7 2
8 8 1
9 9 1
10 10 1
11 11 0
12 12 0
Details:
print ([ts['TS'].between(x, y) for x, y in zip(start_df, end_df)])
[0 False
1 True
2 True
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
Name: TS, dtype: bool, 0 False
1 False
2 False
3 True
4 True
5 True
6 True
7 True
8 False
9 False
10 False
11 False
12 False
Name: TS, dtype: bool, 0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 True
9 True
10 True
11 False
12 False
Name: TS, dtype: bool
Upvotes: 2