OverDose
OverDose

Reputation: 13

How to add different column data in each duplicated csv row using python?

I have the following scenario: I have a train.csv file as the one below. Each row is mentioned 4 times with same index value.

Index sentence ending0 ending1 ending2 ending3 

0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
1        LKJ     KJS     AJA     QHW     IUH             
...      ...     ...     ...     ...     ...
...
...  
2 
...
...
...     

What i am wanting to get is shown below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO    
1       LKJ       0        KJS 
...     ...      ...       ...
...
...   

Upvotes: 0

Views: 51

Answers (4)

OverDose
OverDose

Reputation: 13

@MrNobody33 I am getting the below result with this code so far.

sentence ending ending-id 
ABC        ABC     0     
ABC        DEF     1  
ABC        GHI     2
ABC        JKL     3
ABC        MNO     0   

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO  

Upvotes: 0

MrNobody33
MrNobody33

Reputation: 6483

You could try something like this:

from itertools import cycle
df=df.set_index('Index').drop_duplicates()
newdf= pd.DataFrame(data=df.sentence, columns=['sentence'], index=df.index)
newdf['ending']=df[df.columns[1:]].values.tolist()
newdf=newdf.explode('ending')
ids = cycle([0,1,2,3])
newdf.insert(1, 'endingid', [next(ids) for idx in range(len(newdf))])
print(newdf)

Output:

      sentence  endingid ending
Index                          
0          ABC         0    DEF
0          ABC         1    GHI
0          ABC         2    JKL
0          ABC         3    MNO
1          LKJ         0    KJS
1          LKJ         1    AJA
1          LKJ         2    QHW
1          LKJ         3    IUH

Upvotes: 1

warped
warped

Reputation: 9491

df = _df.copy()
df = pd.melt(df.drop_duplicates(), id_vars=['sentence', 'Index'], value_vars=['ending0','ending1','ending2','ending3'])
df['ending-id'] = df.variable.str.extract('([0-9]+)')
df.rename(columns={'value':'ending'}, inplace=True)
df.drop('variable', axis=1, inplace=True)
df.set_index('Index', inplace=True)

Upvotes: 0

OverDose
OverDose

Reputation: 13

I am getting the below result with this code so far.

sentence Index value ending
ABC        0    DEF    0
ABC        0    DEF    0
ABC        0    DEF    0

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO  

Upvotes: 0

Related Questions