mcgarrya28
mcgarrya28

Reputation: 21

Creating a new dataframe that only pulls out rows contains a specific word

I am working on a project that identifies the differences between 2 Excel workbooks. Where there is a difference in the cell value the code populates the cell with the string "old value Changed to "new value"

df_orig in the first line of code below contains the comparison with all changes highlighted. I then began creating dataframes to pull out the rows in the df_orig that contains the value "Change" as you will see below.

What I want to do is check every column in the df_orig and if the value "Change" appears in any row to pull out that row and create a new dataframe that contains all rows with that "Change" value and output it to a new Excel workbook. In my df_orig there are 187 columns so my method below of creating dataframe after dataframe is not going to work.

df_orig = pd.read_excel ('Horizon/Output/Full_LeaseOutput.xlsx')

df1 = df[df['Lease'].str.contains('Changed', na=False)]
df2 = df[df['Lease Description'].str.contains('Changed', na=False)]
df3 = df[df['Type'].str.contains('Changed', na=False)]
df4 = df[df['Tenant Ref'].str.contains('Changed', na=False)]
df5 = df[df['Tenant Name'].str.contains('Changed', na=False)]
df6 = df[df['Property Ref'].str.contains('Changed', na=False)]

df7 = pd.concat([df1,df2,df3,df4,df5,df6])
df7.to_excel('Horizon/Output/Differences.xlsx', index=True,header=True)

Upvotes: 2

Views: 734

Answers (1)

Rishabh Kumar
Rishabh Kumar

Reputation: 2430

Just modifying your code little bit

df_orig = pd.read_excel ('Horizon/Output/Full_LeaseOutput.xlsx')

ind = df['Lease'].str.contains('Changed', na=False)&\
      df['Lease Description'].str.contains('Changed', na=False)&\
      df['Type'].str.contains('Changed', na=False)&\
      df['Tenant Ref'].str.contains('Changed', na=False)&\
      df['Tenant Name'].str.contains('Changed', na=False)&\
      df['Property Ref'].str.contains('Changed', na=False)

dfnew = df[ind] 

dfnew.to_excel('Horizon/Output/Differences.xlsx', index=True,header=True)

Each condition provides a logical vector for the matching row to be True. So using & we can select the rows which have to be True for every condition.

Upvotes: 2

Related Questions