user2293224
user2293224

Reputation: 2220

Python pandas: converting column values into other columns

I have dataframe which looks like below:

df:

    Review_Text                                                 Noun                                        Thumbups
    Would be nice to be able to import files from ...   [My, Tracks, app, phone, Google, Drive, import...   1.0
    No Offline Maps! It used to have offline maps ...   [Offline, Maps, menu, option, video, exchange,...   18.0
    Great application. Designed with very well tho...   [application, application]                          16.0
    Great App. Nice and simple but accurate. Wish ...   [Great, App, Nice, Exported]                        0.0
    Save For Offline - This does not work. The rou...   [Save, Offline, route, filesystem]                  12.0
    Since latest update app will not run. Subscrip...   [update, app, Subscription, March, application]     9.0
    Great app. Love it! And all the things it does...   [Great, app, Thank, work]                           1.0
    I have paid for subscription but keeps telling...   [subscription, trial, period]                       0.0
    Error: The route cannot be save for no locatio...   [Error, route, i, GPS]                              0.0
    When try to restore my tracks it says "unable ...   [try, file, locally-1]                              0.0
    Was a good app but since the update it only re...   [app, update, metre]                                2.0

based on 'Noun' Column values, I want to create other columns. For example, all values of noun column from first row become columns and those columns contain value of 'Thumbups' column value. If the column name already present in dataframe then it adds 'Thumbups' value into the existing value of the column.

I was trying to implement by using pivot_table :

pd.pivot_table(latest_review,columns='Noun',values='Thumbups')

But got following error:

TypeError: unhashable type: 'list'

Could anyone help me in fixing the issue?

Upvotes: 0

Views: 79

Answers (2)

jezrael
jezrael

Reputation: 863711

Use Series.str.join with Series.str.get_dummies for dummies and then multiple by column Thumbups by DataFrame.mul:

df1 = df['Noun'].str.join('|').str.get_dummies().mul(df['Thumbups'], axis=0)

print (df1)
   App  Drive  Error  Exported  GPS  Google  Great   Maps  March    My  Nice  \
0   0.0   10.0    0.0       0.0  0.0    10.0    0.0    0.0    0.0  10.0   0.0   
1   0.0    0.0    0.0       0.0  0.0     0.0    0.0  180.0    0.0   0.0   0.0   
2   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
3   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
4   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
5   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0   90.0   0.0   0.0   
6   0.0    0.0    0.0       0.0  0.0     0.0   10.0    0.0    0.0   0.0   0.0   
7   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
8   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
9   0.0    0.0    0.0       0.0  0.0     0.0    0.0    0.0    0.0   0.0   0.0   
10  NaN    NaN    NaN       NaN  NaN     NaN    NaN    NaN    NaN   NaN   NaN   

    Offline   Save  Subscription  Thank  Tracks   app  application  exchange  \
0       0.0    0.0           0.0    0.0    10.0  10.0          0.0       0.0   
1     180.0    0.0           0.0    0.0     0.0   0.0          0.0     180.0   
2       0.0    0.0           0.0    0.0     0.0   0.0        160.0       0.0   
3       0.0    0.0           0.0    0.0     0.0   0.0          0.0       0.0   
4     120.0  120.0           0.0    0.0     0.0   0.0          0.0       0.0   
5       0.0    0.0          90.0    0.0     0.0  90.0         90.0       0.0   
6       0.0    0.0           0.0   10.0     0.0  10.0          0.0       0.0   
7       0.0    0.0           0.0    0.0     0.0   0.0          0.0       0.0   
8       0.0    0.0           0.0    0.0     0.0   0.0          0.0       0.0   
9       0.0    0.0           0.0    0.0     0.0   0.0          0.0       0.0   
10      NaN    NaN           NaN    NaN     NaN   NaN          NaN       NaN   

    file  filesystem    i  import  locally-1   menu  metre  option  period  \
0    0.0         0.0  0.0    10.0        0.0    0.0    0.0     0.0     0.0   
1    0.0         0.0  0.0     0.0        0.0  180.0    0.0   180.0     0.0   
2    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
3    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
4    0.0       120.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
5    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
6    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
7    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
8    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
9    0.0         0.0  0.0     0.0        0.0    0.0    0.0     0.0     0.0   
10   NaN         NaN  NaN     NaN        NaN    NaN    NaN     NaN     NaN   

    phone  route  subscription  trial  try  update  video  work  
0    10.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
1     0.0    0.0           0.0    0.0  0.0     0.0  180.0   0.0  
2     0.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
3     0.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
4     0.0  120.0           0.0    0.0  0.0     0.0    0.0   0.0  
5     0.0    0.0           0.0    0.0  0.0    90.0    0.0   0.0  
6     0.0    0.0           0.0    0.0  0.0     0.0    0.0  10.0  
7     0.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
8     0.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
9     0.0    0.0           0.0    0.0  0.0     0.0    0.0   0.0  
10    NaN    NaN           NaN    NaN  NaN     NaN    NaN   NaN  

Upvotes: 1

karthik reddy
karthik reddy

Reputation: 41

example


rows = []
#_unpacking Noun column row list values and storing it in rows list
_ = df.apply(lambda row: [rows.append([row['Review_Text'],row['Thumbups'], nn]) 
                         for nn in row.Noun], axis=1)

#_creates new dataframe with unpacked values
df_new = pd.DataFrame(rows, columns=df.columns)

#_now doing pivot operation on df_new
pivot_df = df_new.pivot(index='Review_Text', columns='Noun')

Upvotes: 0

Related Questions