Reputation: 453
I am new to python and therefore in pandas data frames as well. Lets say that I have a following data set:
d = {'a': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'b': [4, 4, 4, 5, 5, 5, 6, 6, 6]}
...: df = pd.DataFrame(data=d)
...: df
...:
Out[20]:
a b
0 1 4
1 1 4
2 1 4
3 2 5
4 2 5
5 2 5
6 3 6
7 3 6
8 3 6
What I want to do is to create new columns lets say b_1, b_2, b_3, based on the information I have in column a and b. The final data should look like this:
Out[21]:
a b b_1 b_2 b_3
0 1 4 4 0 0
1 1 4 4 0 0
2 1 4 4 0 0
3 2 5 0 5 0
4 2 5 0 5 0
5 2 5 0 5 0
6 3 6 0 0 6
7 3 6 0 0 6
8 3 6 0 0 6
In Stata this is achieved through the following command:
forvalues i=1(1)3{
gen b_`i'=b if a==`i'
replace b_`i'=0 if b_`i'==.
}
Any similar way of doing it in python? Thanks in advance
Upvotes: 0
Views: 217
Reputation: 862641
Use DataFrame.join
with Series.unstack
and DataFrame.add_prefix
:
df = df.join(df.set_index('a', append=True)['b'].unstack(fill_value=0).add_prefix('b_'))
print (df)
a b b_1 b_2 b_3
0 1 4 4 0 0
1 1 4 4 0 0
2 1 4 4 0 0
3 2 5 0 5 0
4 2 5 0 5 0
5 2 5 0 5 0
6 3 6 0 0 6
7 3 6 0 0 6
8 3 6 0 0 6
Upvotes: 1