Jonathan Tran
Jonathan Tran

Reputation: 13

Python Pandas Looping

I'm relatively new to python and pandas (and coding more generally).

I've been trying to figure out the best way to concatenate specific rows in a pd dataframe. I have a df that looks something like

d = {'sheet_index':[1,2,2,3,3,3,4,5], 
'Type':['Note','Note','Note','Note','Note','Note','Note','Note'], 
'Strings_To_Concat':['string1','string2','string3','string4','string5','string6','string7','string8']}
df = pd.DataFrame(data=d)
df

          sheet_index   Type    Strings_To_Concat
0         1             Note    string1
1         2             Note    string2
2         2             Note    string3
3         3             Note    string4
4         3             Note    string5
5         3             Note    string6
6         4             Note    string7
7         5             Note    string8

I want to concatenate the strings in the 'Strings_To_Concat' column if they have the same sheet_index value and are of the Type 'Note' while creating null values in duplicate sheet_index values and not doing anything if the sheet_index isn't duplicated. The result should look something like the below:

          sheet_index   Type    Strings_To_Concat
0         1             Note    string1
1         2             Note    null
2         2             Note    string3 string2
3         3             Note    null
4         3             Note    null
5         3             Note    string6 string5 string4
6         4             Note    string7
7         5             Note    string8

The problem I'm running into is really just an algorithmic one as I've said I'm relatively new to coding. I've tried looping through the pd to identify which rows are designated as Type 'Note' since there are several types in my dataset and from there if the sheet index is the same as the next one concatenate the next one with the current one.

    for i in range(0,len(df)-1):
         if(df.iloc[i,1] == 'Note' and df.iloc[i,0] != df.iloc[i+1,0]):
            df.loc[i,2] = str(df.iloc[i,2])
         elif(df.iloc[i,1] == 'Note' and df.iloc[i,0] == df.iloc[i+1,0]):
            df.loc[i+1,2] = str(df.iloc[i+1,2]) + ' ' + str(df.iloc[i,2])

However, this is producing a strange result. Its creating a 4th column with the header '2' and giving me:

          sheet_index   Type    Strings_To_Concat            2
0         1             Note    string1                      string1
1         2             Note    string2                      null
2         2             Note    string3                      string3
3         3             Note    string4                      null
4         3             Note    string5                      string5 string4
5         3             Note    string6                      string6
6         4             Note    string7                      string7
7         5             Note    string8                      null

Can someone help me explain why its giving me a new column and the loop isn't behaving as I am suspecting it should? Is there a more elegant solution using pandas methods to doing what I'm trying to achieve other than just using basic for loops?

Any help is very much appreciated

Upvotes: 1

Views: 46

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Solution:

m1 = df['Type'].eq('Note')
m2 = df['sheet_index'].duplicated(keep='last')

df['col'] = df[m1].groupby('sheet_index')['Strings_To_Concat'].transform(', '.join)
df['col'] = df['col'].fillna(df['Strings_To_Concat']).mask(m1 & m2)

How it works?

  • Select the rows where Type=Note, then group by sheet_index and transform Strings_to_Concat using ','.join and assign this result to new col
  • Fill the null values in new col using the values from Strings_To_Concat so that we can preserve the other values where Type!=Note
  • Finally mask the values which are duplicated and have Type=Note

Result

   sheet_index  Type Strings_To_Concat                        col
0            1  Note           string1                    string1
1            2  Note           string2                        NaN
2            2  Note           string3           string2, string3
3            3  Note           string4                        NaN
4            3  Note           string5                        NaN
5            3  Note           string6  string4, string5, string6
6            4  Note           string7                    string7
7            5  Note           string8                    string8

Upvotes: 1

Related Questions