Reputation: 13
I have the following scenario: I have a train.csv file as the one below. Each row is mentioned 4 times with same index value.
Index sentence ending0 ending1 ending2 ending3
0 ABC DEF GHI JKL MNO
0 ABC DEF GHI JKL MNO
0 ABC DEF GHI JKL MNO
0 ABC DEF GHI JKL MNO
1 LKJ KJS AJA QHW IUH
... ... ... ... ... ...
...
...
2
...
...
...
What i am wanting to get is shown below:
Index sentence ending-id ending
0 ABC 0 DEF
0 ABC 1 GHI
0 ABC 2 JKL
0 ABC 3 MNO
1 LKJ 0 KJS
... ... ... ...
...
...
Upvotes: 0
Views: 51
Reputation: 13
@MrNobody33 I am getting the below result with this code so far.
sentence ending ending-id
ABC ABC 0
ABC DEF 1
ABC GHI 2
ABC JKL 3
ABC MNO 0
while i am looking for a result like the one below:
Index sentence ending-id ending
0 ABC 0 DEF
0 ABC 1 GHI
0 ABC 2 JKL
0 ABC 3 MNO
Upvotes: 0
Reputation: 6483
You could try something like this:
from itertools import cycle
df=df.set_index('Index').drop_duplicates()
newdf= pd.DataFrame(data=df.sentence, columns=['sentence'], index=df.index)
newdf['ending']=df[df.columns[1:]].values.tolist()
newdf=newdf.explode('ending')
ids = cycle([0,1,2,3])
newdf.insert(1, 'endingid', [next(ids) for idx in range(len(newdf))])
print(newdf)
Output:
sentence endingid ending
Index
0 ABC 0 DEF
0 ABC 1 GHI
0 ABC 2 JKL
0 ABC 3 MNO
1 LKJ 0 KJS
1 LKJ 1 AJA
1 LKJ 2 QHW
1 LKJ 3 IUH
Upvotes: 1
Reputation: 9491
df = _df.copy()
df = pd.melt(df.drop_duplicates(), id_vars=['sentence', 'Index'], value_vars=['ending0','ending1','ending2','ending3'])
df['ending-id'] = df.variable.str.extract('([0-9]+)')
df.rename(columns={'value':'ending'}, inplace=True)
df.drop('variable', axis=1, inplace=True)
df.set_index('Index', inplace=True)
Upvotes: 0
Reputation: 13
I am getting the below result with this code so far.
sentence Index value ending
ABC 0 DEF 0
ABC 0 DEF 0
ABC 0 DEF 0
while i am looking for a result like the one below:
Index sentence ending-id ending
0 ABC 0 DEF
0 ABC 1 GHI
0 ABC 2 JKL
0 ABC 3 MNO
Upvotes: 0