Reputation: 845
I have a list of numbers from 1 to 24
hour = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]
I want to create four separate lists (a, b, c, d)
based on respective conditions:
a = (22<= number <6)
b = (6<= number <9 or 12<= number <18)
c = (9<= number <12)
d = (18<= number <22
if condition is True
then 1 else
0
I can create the a
column but with rest of them I am facing issue,
import pandas as pd
df = pd.read_excel('mask.xlsx')
hour = df['Hours']
df['a'] = [1 if i>=22 or i<6 else 0 for i in hour]
b=[]
for i,j in zip(hour, df['a']):
if i <= 6 and i < 9:
if i<=12 and i<18:
if j==0:
b.append(1)
else:
b.append(0)
df['b'] = b
ValueError: Length of values does not match length of index
Upvotes: 0
Views: 103
Reputation: 35686
I would recommend vectorized comparisons directly with the Series. Also taking advantage of Series.between
and the inclusive
argument to make loweround inclusive:
df['a'] = (df['Hours'].ge(22) | df['Hours'].lt(6)).astype(int)
df['b'] = (df['Hours'].between(6, 9, inclusive='left') |
df['Hours'].between(12, 18, inclusive='left')).astype(int)
df['c'] = df['Hours'].between(9, 12, inclusive='left').astype(int)
df['d'] = df['Hours'].between(18, 22, inclusive='left').astype(int)
Notice: Both compound conditions should be or not and.
a
is Hours more than 22 or less than 6.b
are hours between [6, 9) or [12, 18).Perhaps a bit overkill here, but can also use pd.get_dummies
after pd.cut
with duplicate labels then join
back:
df = df.join(
pd.get_dummies(
pd.cut(df['Hours'],
bins=[1, 6, 9, 12, 18, 22, 25],
labels=['a', 'b', 'c', 'b', 'd', 'a'],
ordered=False,
right=False)
)
)
Both Options Produce:
df
:
Hours a b c d
0 1 1 0 0 0
1 2 1 0 0 0
2 3 1 0 0 0
3 4 1 0 0 0
4 5 1 0 0 0
5 6 0 1 0 0
6 7 0 1 0 0
7 8 0 1 0 0
8 9 0 0 1 0
9 10 0 0 1 0
10 11 0 0 1 0
11 12 0 1 0 0
12 13 0 1 0 0
13 14 0 1 0 0
14 15 0 1 0 0
15 16 0 1 0 0
16 17 0 1 0 0
17 18 0 0 0 1
18 19 0 0 0 1
19 20 0 0 0 1
20 21 0 0 0 1
21 22 1 0 0 0
22 23 1 0 0 0
23 24 1 0 0 0
Upvotes: 3
Reputation: 22503
Using np.select
and pd.crosstab
:
df = pd.DataFrame({"Hours": hour})
df["g"] = (np.select([(df.Hours<6)|(df.Hours>=22),
(df.Hours<12)&(df.Hours>=9),
(df.Hours<22)&(df.Hours>=18),],
list("acd"), "b"))
print (pd.crosstab(df["Hours"], df["g"]))
g a b c d
Hours
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 0 0 0
5 1 0 0 0
6 0 1 0 0
7 0 1 0 0
8 0 1 0 0
9 0 0 1 0
10 0 0 1 0
11 0 0 1 0
12 0 1 0 0
13 0 1 0 0
14 0 1 0 0
15 0 1 0 0
16 0 1 0 0
17 0 1 0 0
18 0 0 0 1
19 0 0 0 1
20 0 0 0 1
21 0 0 0 1
22 1 0 0 0
23 1 0 0 0
24 1 0 0 0
Upvotes: 3
Reputation: 59579
You can use np.select
implement if-elif-else logic to assign the group and then form the dummies from that Series.
import pandas as pd
import numpy as np
conds = [df['Hours'].lt(6) | df['Hours'].ge(22),
(df['Hours'].ge(6) & df['Hours'].lt(9)) | (df['Hours'].ge(12) & df['Hours'].lt(18)),
df['Hours'].ge(9) & df['Hours'].lt(12),
df['Hours'].ge(18) & df['Hours'].lt(22)
]
choices = ['a', 'b', 'c', 'd']
dums = pd.get_dummies(pd.Series(np.select(conds, choices), index=df.index))
df = pd.concat([df, dums], axis=1)
Hours a b c d
0 1 1 0 0 0
1 2 1 0 0 0
2 3 1 0 0 0
3 4 1 0 0 0
4 5 1 0 0 0
5 6 0 1 0 0
6 7 0 1 0 0
7 8 0 1 0 0
8 9 0 0 1 0
9 10 0 0 1 0
10 11 0 0 1 0
11 12 0 1 0 0
12 13 0 1 0 0
13 14 0 1 0 0
14 15 0 1 0 0
15 16 0 1 0 0
16 17 0 1 0 0
17 18 0 0 0 1
18 19 0 0 0 1
19 20 0 0 0 1
20 21 0 0 0 1
21 22 1 0 0 0
22 23 1 0 0 0
23 24 1 0 0 0
Upvotes: 3