chitown88
chitown88

Reputation: 28630

Python - take rows of data and place into single column

I have the following dataframe with numerous rows. I would like to take the multiple columns and condense it to one column.

Player         |    0             |     1     |     2     | 3         |  4
Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl
Tony Gonzalez  | 1st Tm All-Conf. | AP 1st Tm | None      | None      | None
...            |  ...             | ...       | ...       | ...       | ...

I'm trying to figure out how to restructure it so the awards are all in one column. So it would look like a dataframe as follows:

Player         | awardID
Edgerrin James | 1st Tm All-Conf.
Edgerrin James | AP 1st Tm
Edgerrin James | FW 1st Tm 
Edgerrin James | SN 1st Tm
Edgerrin James | Pro Bowl
Tony Gonzalez  | 1st Tm All-Conf.
Tony Gonzalez  | AP 1st Tm

If the 'None' cells are also included, I'd be fine with that because I know how to filter those out after, but can't figure out the first part.

Upvotes: 0

Views: 128

Answers (2)

keyvan vafaee
keyvan vafaee

Reputation: 466

solution without panda
first save any row in a string like s

  def mylist(string):

   string = string.split('|')
   length = len(string)-1
   for i in range(length):
    print string[0],string[i+1:i+2],'\n'

  s1 = 'Edgerrin James | 1st Tm All-Conf. | AP 1st Tm | FW 1st Tm | SN 1st Tm | Pro Bowl'

  s2 = 'Tony Gonzalez  | 1st Tm All-Conf. | AP 1st Tm | None      | None      | None'


 mylist(s1)
 mylist(s2)

ouput:

Edgerrin James  [' 1st Tm All-Conf. '] 

Edgerrin James  [' AP 1st Tm '] 

Edgerrin James  [' FW 1st Tm '] 

Edgerrin James  [' SN 1st Tm '] 

Tony Gonzalez   [' 1st Tm All-Conf. '] 

Tony Gonzalez   [' AP 1st Tm '] 

Tony Gonzalez   [' None      '] 

Tony Gonzalez   [' None      '] 

Tony Gonzalez   [' None'] 

do this for all player and rows

Upvotes: 1

Zero
Zero

Reputation: 76947

Use set_index on Player and stack

In [750]: df.set_index('Player').stack().reset_index(name='awardID').drop('level_1', 1)
Out[750]:
           Player           awardID
0  Edgerrin James  1st Tm All-Conf.
1  Edgerrin James         AP 1st Tm
2  Edgerrin James         FW 1st Tm
3  Edgerrin James         SN 1st Tm
4  Edgerrin James          Pro Bowl
5   Tony Gonzalez  1st Tm All-Conf.
6   Tony Gonzalez         AP 1st Tm
7   Tony Gonzalez              None
8   Tony Gonzalez              None
9   Tony Gonzalez              None

Optinally, remove None using query

In [757]: (df.set_index('Player')
             .stack()
             .reset_index(name='awardID')
             .drop('level_1', 1)
             .query('awardID != "None"'))
Out[757]:
           Player           awardID
0  Edgerrin James  1st Tm All-Conf.
1  Edgerrin James         AP 1st Tm
2  Edgerrin James         FW 1st Tm
3  Edgerrin James         SN 1st Tm
4  Edgerrin James          Pro Bowl
5   Tony Gonzalez  1st Tm All-Conf.
6   Tony Gonzalez         AP 1st Tm

Upvotes: 2

Related Questions