Reputation: 41
I have a Pandas dataframe like this :
id A B
0 1 toto+tata
1 1 toto+tata
2 2 titi+tutu
3 2 titi+tutu
4 3 toto+tata+titi
5 3 toto+tata+titi
6 3 toto+tata+titi
Thanks to the split
function, I can split these string values :
for i in range(len(df)):
var = output.iloc[i, 1].split("+")
print(var)
['toto', 'tata']
['toto', 'tata']
['titi', 'tutu']
['titi', 'tutu']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
['toto', 'tata', 'titi']
Now, I want to put these values in a new column, one value after the other :
id A B C
0 1 toto+tata toto
1 1 toto+tata tata
2 2 titi+tutu titi
3 2 titi+tutu tutu
4 3 toto+tata+titi toto
5 3 toto+tata+titi tata
6 3 toto+tata+titi titi
I tried to do this :
for i in range(len(df)):
var = df.iloc[i, 1].split("+")
print(var)
for y in range(len(var)):
df.at[i, 'C'] = var[y]
But it always returns the last value of the split :
id A B C
0 1 toto+tata tata
1 1 toto+tata tata
2 2 titi+tutu tutu
3 2 titi+tutu tutu
4 3 toto+tata+titi titi
5 3 toto+tata+titi titi
6 3 toto+tata+titi titi
I'm missing this little detail for my algo to work but I can't find it.
Upvotes: 1
Views: 211
Reputation: 260690
Assuming you always have as many rows in a group that there are '+' separated items, that the groups are consecutive, and that the strings are identical per group.
A simple way is to remove the duplicates, str.split
, and explode
:
df['C'] = df.groupby('A')['B'].first().str.split('+').explode().values
output:
id A B C
0 0 1 toto+tata toto
1 1 1 toto+tata tata
2 2 2 titi+tutu titi
3 3 2 titi+tutu tutu
4 4 3 toto+tata+titi toto
5 5 3 toto+tata+titi tata
6 6 3 toto+tata+titi titi
If the rows are not grouped per consecutive "group", apply the same logic per group with groupby
+transform
:
Example:
# shuffle the rows to generate an example
df2 = df.sample(frac=1)
# extract the chunks
df2['C'] = df2.groupby('A')['B'].transform(lambda x: x.head(1).str.split('+').explode().values)
output:
id A B C
4 4 3 toto+tata+titi toto
1 1 1 toto+tata toto
0 0 1 toto+tata tata
3 3 2 titi+tutu titi
6 6 3 toto+tata+titi tata
5 5 3 toto+tata+titi titi
2 2 2 titi+tutu tutu
Upvotes: 1