Reputation: 28630
I have the following dataframe with numerous rows. I would like to take the multiple columns and condense it to one column.
Player | 0 | 1 | 2 | 3 | 4
Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl
Tony Gonzalez | 1st Tm All-Conf. | AP 1st Tm | None | None | None
... | ... | ... | ... | ... | ...
I'm trying to figure out how to restructure it so the awards are all in one column. So it would look like a dataframe as follows:
Player | awardID
Edgerrin James | 1st Tm All-Conf.
Edgerrin James | AP 1st Tm
Edgerrin James | FW 1st Tm
Edgerrin James | SN 1st Tm
Edgerrin James | Pro Bowl
Tony Gonzalez | 1st Tm All-Conf.
Tony Gonzalez | AP 1st Tm
If the 'None' cells are also included, I'd be fine with that because I know how to filter those out after, but can't figure out the first part.
Upvotes: 0
Views: 128
Reputation: 466
solution without panda
first save any row in a string like s
def mylist(string):
string = string.split('|')
length = len(string)-1
for i in range(length):
print string[0],string[i+1:i+2],'\n'
s1 = 'Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl'
s2 = 'Tony Gonzalez | 1st Tm All-Conf. | AP 1st Tm | None | None | None'
mylist(s1)
mylist(s2)
ouput:
Edgerrin James [' 1st Tm All-Conf. ']
Edgerrin James [' AP 1st Tm ']
Edgerrin James [' FW 1st Tm ']
Edgerrin James [' SN 1st Tm ']
Tony Gonzalez [' 1st Tm All-Conf. ']
Tony Gonzalez [' AP 1st Tm ']
Tony Gonzalez [' None ']
Tony Gonzalez [' None ']
Tony Gonzalez [' None']
do this for all player and rows
Upvotes: 1
Reputation: 76947
Use set_index
on Player
and stack
In [750]: df.set_index('Player').stack().reset_index(name='awardID').drop('level_1', 1)
Out[750]:
Player awardID
0 Edgerrin James 1st Tm All-Conf.
1 Edgerrin James AP 1st Tm
2 Edgerrin James FW 1st Tm
3 Edgerrin James SN 1st Tm
4 Edgerrin James Pro Bowl
5 Tony Gonzalez 1st Tm All-Conf.
6 Tony Gonzalez AP 1st Tm
7 Tony Gonzalez None
8 Tony Gonzalez None
9 Tony Gonzalez None
Optinally, remove None
using query
In [757]: (df.set_index('Player')
.stack()
.reset_index(name='awardID')
.drop('level_1', 1)
.query('awardID != "None"'))
Out[757]:
Player awardID
0 Edgerrin James 1st Tm All-Conf.
1 Edgerrin James AP 1st Tm
2 Edgerrin James FW 1st Tm
3 Edgerrin James SN 1st Tm
4 Edgerrin James Pro Bowl
5 Tony Gonzalez 1st Tm All-Conf.
6 Tony Gonzalez AP 1st Tm
Upvotes: 2