jonas
jonas

Reputation: 13969

Pandas stack/groupby to make a new dataframe

I have a problem of creating and rearrange a dataset. I Looked at the pandas groupby function and think it might help me do it but I am to inexperienced to make it happen. I have created an example of my problem below: my df:

     vehicle  color  a  b  c  d  A1  A2  A3  B1  B2  B3  C1  C2  C3  D1  D2  D3
resp                                                                           
1       bike  green  5  4  1  3   3   4   5   3   5   3 NaN NaN NaN NaN NaN NaN
2       walk    red  5  3  3  3   4   5   3   3   5   4 NaN NaN NaN NaN NaN NaN
3        car  green  4  2  3  3   4   3   5   4   5   5 NaN NaN NaN NaN NaN NaN
4        car   blue  4  5  4  4 NaN NaN NaN NaN NaN NaN   5   5   5   3   3   4
5        bus  black  2  4  4  3 NaN NaN NaN   2   3   3   2   2   1 NaN NaN NaN
6        car    red  4  2  3  3   3   4   4 NaN NaN NaN   4   4   4 NaN NaN NaN
7        bus   blue  5  5  2  3   3   3   5   4   3   2 NaN NaN NaN NaN NaN NaN
8       walk    red  3  3  4  3 NaN NaN NaN   5   5   5   5   3   3 NaN NaN NaN
9        car   blue  5  3  4  3   3   3   3 NaN NaN NaN   4   3   4 NaN NaN NaN

The dataset contains respondents and answers to a questionare. What I would like to do is to make a new dataframe with resp as index and the data from how the respondents answered rearranged. The data in columns a,b,c,d, vehicle and color are stacked for the respondents (Hope thats the right way to express it) in the new dataframe. Also the values from columns A to C are in the new frame under columns BL_val. Only the data that corresponds from Capital letter (A1-D3) to small letter (a,b,c,d) are filled in. The rest are NAN.

I would like to create a new dataframe from this and it shoul look like:

ds:

     vehicle  color sl  sl_val  BL_val1  BL_val2  BL_val3
resp                                                     
1       bike  green  a       5        3        4        5
1       bike  green  b       4        3        5        3
1       bike  green  c       1      NaN      NaN      NaN
1       bike  green  d       3      NaN      NaN      NaN
2       walk    red  a       5        4        5        3
2       walk    red  b       3        3        5        4
2       walk    red  c       3      NaN      NaN      NaN
2       walk    red  d       3      NaN      NaN      NaN
3        car  green  a       4        4        3        5
3        car  green  b       2        4        5        5
3        car  green  c       3      NaN      NaN      NaN
3        car  green  d       3      NaN      NaN      NaN
4        car   blue  a       4      NaN      NaN      NaN
4        car   blue  b       5      NaN      NaN      NaN
4        car   blue  c       4        5        5        5
4        car   blue  d       4        3        3        4
5        bus  black  a       2      NaN      NaN      NaN
5        bus  black  b       4        2        3        3
5        bus  black  c       4        2        2        1
5        bus  black  d       3      NaN      NaN      NaN
6        car    red  a       4        3        4        4
6        car    red  b       2      NaN      NaN      NaN
6        car    red  c       3        4        4        4
6        car    red  d       3      NaN      NaN      NaN
7        bus   blue  a       5        3        3        5
7        bus   blue  b       5        4        3        2
7        bus   blue  c       2      NaN      NaN      NaN
7        bus   blue  d       3      NaN      NaN      NaN
8       walk    red  a       3      NaN      NaN      NaN
8       walk    red  b       3        5        5        5
8       walk    red  c       4        5        3        3
8       walk    red  d       3      NaN      NaN      NaN
9        car   blue  a       5        3        3        3
9        car   blue  b       3      NaN      NaN      NaN
9        car   blue  c       4        4        3        4
9        car   blue  d     NaN      NaN      NaN      NaN

I really need some help with this, I cant figure it out!!

Upvotes: 6

Views: 10563

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375535

A naive way would write a helper function to extract the relevant subDataFrames:

In [11]: def get_letter(df, letter):
             res = df.loc[:, ['vehicle', 'color', letter] + [letter.upper() + str(i) for i in xrange(1, 4)]]
             res.columns = ['vehicle', 'color', 'sl_val', 'BL_val1', 'BL_val2', 'BL_val3']
             res['sl'] = letter
             return res

In [12]: get_letter(df, 'a')
Out[12]: 
     vehicle  color  sl_val  BL_val1  BL_val2  BL_val3 sl
resp                                                     
1       bike  green       5        3        4        5  a
2       walk    red       5        4        5        3  a
3        car  green       4        4        3        5  a
4        car   blue       4      NaN      NaN      NaN  a
5        bus  black       2      NaN      NaN      NaN  a
6        car    red       4        3        4        4  a
7        bus   blue       5        3        3        5  a
8       walk    red       3      NaN      NaN      NaN  a
9        car   blue       5        3        3        3  a

Then concating these gets the result you're after (possibly with a sort):

In [13]: pd.concat(get_letter(df, letter) for letter in 'abcd')

In [14]: pd.concat(get_letter(df, letter) for letter in 'abcd').sort()

Upvotes: 3

DSM
DSM

Reputation: 353119

There might be a more pandorable way to do this, but I find that the pattern of using groupby to make the groups, performing explicit operations on them, and then recombining, is often a simple way to get what I want. Sure, I could spend half an hour coming up with something more elegant, but then I wouldn't have any time to hang out on SO..

Anyway, how about something like this?

df = df.set_index(["resp", "vehicle", "color"])

grouped = df.groupby(lambda x: x[0].lower(), axis=1)

new_grouped = []
for key, group in grouped:
    group.columns = ["sl_val"] + ["BL_val{}".format(i) for i in range(1,4)]
    group["sl"] = key
    new_grouped.append(group)

df2 = pd.concat(new_grouped).reset_index()

df2 = df2.sort(["resp", "vehicle", "color"]).set_index("resp")
df2 = df2[["vehicle", "color", "sl"] + [k for k in df2.columns if "_" in k]]

Starting from

>>> df = df.set_index(["resp", "vehicle", "color"])
>>> df
                    a  b  c  d  A1  A2  A3  B1  B2  B3  C1  C2  C3  D1  D2  D3
resp vehicle color                                                            
1    bike    green  5  4  1  3   3   4   5   3   5   3 NaN NaN NaN NaN NaN NaN
2    walk    red    5  3  3  3   4   5   3   3   5   4 NaN NaN NaN NaN NaN NaN
3    car     green  4  2  3  3   4   3   5   4   5   5 NaN NaN NaN NaN NaN NaN
4    car     blue   4  5  4  4 NaN NaN NaN NaN NaN NaN   5   5   5   3   3   4
5    bus     black  2  4  4  3 NaN NaN NaN   2   3   3   2   2   1 NaN NaN NaN
6    car     red    4  2  3  3   3   4   4 NaN NaN NaN   4   4   4 NaN NaN NaN
7    bus     blue   5  5  2  3   3   3   5   4   3   2 NaN NaN NaN NaN NaN NaN
8    walk    red    3  3  4  3 NaN NaN NaN   5   5   5   5   3   3 NaN NaN NaN
9    car     blue   5  3  4  3   3   3   3 NaN NaN NaN   4   3   4 NaN NaN NaN

We can group by the starting letter in lowercase:

>>> grouped = df.groupby(lambda x: x[0].lower(), axis=1)

Producing a bunch of groups which look like:

>>> next(iter(grouped))
('a',                     a  A1  A2  A3
resp vehicle color               
1    bike    green  5   3   4   5
2    walk    red    5   4   5   3
3    car     green  4   4   3   5
4    car     blue   4 NaN NaN NaN
5    bus     black  2 NaN NaN NaN
6    car     red    4   3   4   4
7    bus     blue   5   3   3   5
8    walk    red    3 NaN NaN NaN
9    car     blue   5   3   3   3)

Then we simply change the names, add the "sl" column, and recombine them using pd.concat. The last bits simply match your desired order.

Final result:

>>> df2
     vehicle  color sl  sl_val  BL_val1  BL_val2  BL_val3
resp                                                     
1       bike  green  a       5        3        4        5
1       bike  green  b       4        3        5        3
1       bike  green  c       1      NaN      NaN      NaN
1       bike  green  d       3      NaN      NaN      NaN
2       walk    red  a       5        4        5        3
2       walk    red  b       3        3        5        4
2       walk    red  c       3      NaN      NaN      NaN
2       walk    red  d       3      NaN      NaN      NaN
3        car  green  a       4        4        3        5
3        car  green  b       2        4        5        5
3        car  green  c       3      NaN      NaN      NaN
3        car  green  d       3      NaN      NaN      NaN
4        car   blue  a       4      NaN      NaN      NaN
4        car   blue  b       5      NaN      NaN      NaN
4        car   blue  c       4        5        5        5
4        car   blue  d       4        3        3        4
5        bus  black  a       2      NaN      NaN      NaN
5        bus  black  b       4        2        3        3
5        bus  black  c       4        2        2        1
5        bus  black  d       3      NaN      NaN      NaN
6        car    red  a       4        3        4        4
6        car    red  b       2      NaN      NaN      NaN
6        car    red  c       3        4        4        4
6        car    red  d       3      NaN      NaN      NaN
7        bus   blue  a       5        3        3        5
7        bus   blue  b       5        4        3        2
7        bus   blue  c       2      NaN      NaN      NaN
7        bus   blue  d       3      NaN      NaN      NaN
8       walk    red  a       3      NaN      NaN      NaN
8       walk    red  b       3        5        5        5
8       walk    red  c       4        5        3        3
8       walk    red  d       3      NaN      NaN      NaN
9        car   blue  a       5        3        3        3
9        car   blue  b       3      NaN      NaN      NaN
9        car   blue  c       4        4        3        4
9        car   blue  d       3      NaN      NaN      NaN

Upvotes: 5

Related Questions