omaraskar
omaraskar

Reputation: 13

Python3 dataframe restructure

I have a dataframe that consists of 2 columns; first columns 'Features', contains 11 unique values stacked vertically 148037 times, with the 'Values' column having different corresponding values.

    Feature         Value
0   Way ID          2781002
1   Highway         motorway_link
2   Toll            yes
3   Reference       n/a
4   Bridge          yes
5   County          n/a
6   Name            n/a
7   Name2           n/a
8   Name3           n/a
9   Name4           n/a
10  Name Type       n/a
11  Way ID          2788620
12  Highway         motorway
13  Toll            yes
14  Reference       A 49
15  Bridge          n/a
16  County          n/a
17  Name            n/a
18  Name2           n/a
19  Name3           n/a
20  Name4           n/a
21  Name Type       n/a
22  Way ID          2954156
... ... ...
148026  Name Type   n/a
148027  Way ID      545273699
148028  Highway     motorway
148029  Toll        yes
148030  Reference   A 4
148031  Bridge      n/a
148032  County      n/a
148033  Name        Autoroute de l'Est
148034  Name2       n/a
148035  Name3       n/a
148036  Name4       n/a
148037  Name Type   n/a

I'd like to have the dataframe set up horizontally in the following way:

    Feature         Value_1           Value_2        Value_3
0   Way ID          2781002           2788620        2954156
1   Highway         motorway_link     motorway       motorway
2   Toll            yes               yes            yes       
3   Reference       n/a               n/a            n/a
4   Bridge          yes               yes            no
... ... ...    
10  Name Type       n/a               n/a            n/a

How can I do this? I tried to use a loop and create a new df/list for every new set of 11 rows, then concatenate them together. Problem is, I can't create new df with different names using '{}'.format(i) syntax within a loop, it doesn't like it for some reason.

Upvotes: 1

Views: 60

Answers (2)

BENY
BENY

Reputation: 323366

By using pivot_table

pd.pivot_table(df,index=df.Feature,columns=df.groupby('Feature').cumcount().add(1),values='Value',aggfunc='sum').add_prefix('Value_')
Out[779]: 
                 Value_1   Value_2  Value_3
Feature                                    
Bridge               yes       n/a     None
County               n/a       n/a     None
Highway    motorway_link  motorway     None
Name                 n/a       n/a     None
Name2                n/a       n/a     None
Name3                n/a       n/a     None
Name4                n/a       n/a     None
NameType             n/a       n/a     None
Reference            n/a       A49     None
Toll                 yes       yes     None
WayID            2781002   2788620  2954156

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

Try this:

df.groupby('Feature')['Value'].apply(lambda x: pd.Series(x.tolist())).unstack().add_prefix('Value_')

Using groupby, apply and unstack.

Upvotes: 1

Related Questions