Python Pandas Looping

Question

I'm relatively new to python and pandas (and coding more generally).

I've been trying to figure out the best way to concatenate specific rows in a pd dataframe. I have a df that looks something like

d = {'sheet_index':[1,2,2,3,3,3,4,5], 
'Type':['Note','Note','Note','Note','Note','Note','Note','Note'], 
'Strings_To_Concat':['string1','string2','string3','string4','string5','string6','string7','string8']}
df = pd.DataFrame(data=d)
df

          sheet_index   Type    Strings_To_Concat
0         1             Note    string1
1         2             Note    string2
2         2             Note    string3
3         3             Note    string4
4         3             Note    string5
5         3             Note    string6
6         4             Note    string7
7         5             Note    string8

I want to concatenate the strings in the 'Strings_To_Concat' column if they have the same sheet_index value and are of the Type 'Note' while creating null values in duplicate sheet_index values and not doing anything if the sheet_index isn't duplicated. The result should look something like the below:

          sheet_index   Type    Strings_To_Concat
0         1             Note    string1
1         2             Note    null
2         2             Note    string3 string2
3         3             Note    null
4         3             Note    null
5         3             Note    string6 string5 string4
6         4             Note    string7
7         5             Note    string8

The problem I'm running into is really just an algorithmic one as I've said I'm relatively new to coding. I've tried looping through the pd to identify which rows are designated as Type 'Note' since there are several types in my dataset and from there if the sheet index is the same as the next one concatenate the next one with the current one.

    for i in range(0,len(df)-1):
         if(df.iloc[i,1] == 'Note' and df.iloc[i,0] != df.iloc[i+1,0]):
            df.loc[i,2] = str(df.iloc[i,2])
         elif(df.iloc[i,1] == 'Note' and df.iloc[i,0] == df.iloc[i+1,0]):
            df.loc[i+1,2] = str(df.iloc[i+1,2]) + ' ' + str(df.iloc[i,2])

However, this is producing a strange result. Its creating a 4th column with the header '2' and giving me:

          sheet_index   Type    Strings_To_Concat            2
0         1             Note    string1                      string1
1         2             Note    string2                      null
2         2             Note    string3                      string3
3         3             Note    string4                      null
4         3             Note    string5                      string5 string4
5         3             Note    string6                      string6
6         4             Note    string7                      string7
7         5             Note    string8                      null

Can someone help me explain why its giving me a new column and the loop isn't behaving as I am suspecting it should? Is there a more elegant solution using pandas methods to doing what I'm trying to achieve other than just using basic for loops?

Any help is very much appreciated

Python Pandas Looping

Answers (1)

Related Questions