Reputation: 103

How to transpose specific columns based on a condition in python 2.7

I have the following data format in a file:

ID,var_name,var_value
1,ABC,This is abc1
1,DEF,This is def1
2,ABC,This is abc2
2,DEF,This is def2
2,GHI,This is ghi2
3,ABC,This is abc3
4,ABC,This is abc4
4,DEF,This is def4

also I have a header list = ['ABC','GHI']

In the above data set each "ID" will not necessarily have all the variables, however ID:2 contains the maximum number of variables (ABC,DEF,GHI). I need to convert the above dataset to the following nested list format:

[['ID','ABC','GHI'], [1,'This is abc1', ''],[2, 'This is abc2','This is ghi2'],[3,'This is abc3',''],[4,'This is abc4','']]

That means the list should:

populate the data for all the IDs
create null string for those variables which are not present in the above dataset.
populate the nested list, maintaining the same order as the Header list
populate only those header value i.e. the header list only has values 'ABC','GHI' so the nested list should only populate the values 'ABC' and 'GHI', and should ignore 'DEF' rows from the above dataset.

I want to do this in Python 2.7, possibly using Pandas.

Upvotes: 1

Answers (3)

Ken T

Reputation: 2553

I think you should try to stay in this beautiful panda's dataframe

df2=(df.pivot(index='ID', columns='var_name', values='var_value')
     .fillna('').drop('DEF', axis=1).reset_index())



#output:
var_name  ID           ABC           GHI
0          1  This is abc1              
1          2  This is abc2  This is ghi2
2          3  This is abc3              
3          4  This is abc4

But also you can do further to acheive the list:

print([df2.columns.tolist()] + df2.values.tolist())

[['ID', 'ABC', 'GHI'], 
[1, 'This is abc1', ''], 
[2, 'This is abc2', 'This is ghi2'], 
[3, 'This is abc3', ''], 
[4, 'This is abc4', '']]

Upvotes: 1

jezrael

Reputation: 862406

Use:

L = ['ABC','GHI']

df1 = df.pivot('ID', 'var_name', 'var_value').fillna('')[L].reset_index()
print (df1)
var_name  ID           ABC           GHI
0          1  This is abc1              
1          2  This is abc2  This is ghi2
2          3  This is abc3              
3          4  This is abc4     

L1 = [df1.columns.tolist()] + df1.values.tolist()
print (L1)

[['ID', 'ABC', 'GHI'], 
 [1, 'This is abc1', ''], 
 [2, 'This is abc2', 'This is ghi2'],
 [3, 'This is abc3', ''], 
 [4, 'This is abc4', '']]

Explanation:

First pivot, replace NaNs by fillna, convert subset for filtering columns and create column from index by reset_index
Last create nested list and last insert columns names

EDIT:

I try change order of values in list:

L = ['GHI', 'ABC']
df1 = df.pivot('ID', 'var_name', 'var_value').fillna('')[L].reset_index()
print (df1)
var_name  ID           GHI           ABC
0          1                This is abc1
1          2  This is ghi2  This is abc2
2          3                This is abc3
3          4                This is abc4

L1 = [df1.columns.tolist()] + df1.values.tolist()
print (L1)

[['ID', 'GHI', 'ABC'],
 [1, '', 'This is abc1'], 
 [2, 'This is ghi2', 'This is abc2'], 
 [3, '', 'This is abc3'], 
 [4, '', 'This is abc4']]

Upvotes: 1

AChampion

Reputation: 30258

Alternatively, you can just set a multiindex and unstack:

In []:
L = ['ABC', 'GHI']
df = df.set_index(['ID', 'var_name'])['var_value'].unstack(fill_value='')[L].reset_index()
df

Out[]:
var_name  ID           ABC           GHI
0          1  This is abc1              
1          2  This is abc2  This is ghi2
2          3  This is abc3              
3          4  This is abc4              

In []:
[df.columns.tolist()] + df.values.tolist()

Out[]:
[['ID', 'ABC', 'GHI'],
 [1, 'This is abc1', ''],
 [2, 'This is abc2', 'This is ghi2'],
 [3, 'This is abc3', ''],
 [4, 'This is abc4', '']]

Upvotes: 1

How to transpose specific columns based on a condition in python 2.7

Answers (3)

Related Questions