Anton
Anton

Reputation: 33

Python Pandas conditional row merge

I am new to python and Pandas and I was looking for this answer for the past few weeks, but I hit a brick wall. Although, I normally find my answers in here or somewhere else on the net - I wasn't lucky this time.

Problem:

I have an excel file which I need to loop through to create a script for a firewall.

Column A, row 1,2,3 are merged and contains one Hostname Column B, row 1,2,3 are not merged and each row contains one IP address each

How do I write a condition to say if column A has any merged rows then check and merge the same row's in column B?

My Code:

import pandas as pd

df = pd.read_excel('file.xlsx',sheetname="sheet_name")

df1 = df['Unnamed: 2'].dropna().drop([6,7]) # Unanamed is the column ID then Delete empty cell's and delete row 6&7
df2 = df['Unnamed: 4'].dropna().drop([6,7]) # Unanamed is the column ID then Delete empty cell's and delete row 6&7

for a, b in zip(df1,df2):
  # The condition I am looking for, please note the code below is not correct
  if a is merged then merge b:
      print('Hostname {} ip address {}'.format(a,b))

enter image description here

Thank you in advance.

Upvotes: 3

Views: 1526

Answers (1)

gyoza
gyoza

Reputation: 2152

If you load only the input part (red square in your question) as df, and if the column 'Host name' is merged, it would look like this (df)

Host name IP address
LAX_R     192.168.1.1
NaN       192.168.1.2
NaN       192.168.1.3

By the way, you can check if the column 'Host name' is initially merged by checking if it contains NaN value (Note: I'm assuming there should not be NaN values if it is not initially merged):

if sum(df['Host name'].isnull()) > 0:

And then, by forward filling NaN,

df.ffill(inplace=True)

we get

Host name IP address
LAX_R     192.168.1.1
LAX_R     192.168.1.2
LAX_R     192.168.1.3

After that, IP address of the same Host name are aggregated to the same cell as a list

df = pd.DataFrame(df.groupby('Host name')['IP address'].apply(list))

At this point, df looks like this

           IP address
Host name   
LAX_R      [192.168.1.1, 192.168.1.2, 192.168.1.3]

The next step is to remove some unnecessary characters

df.columns = ['IP address']
for ch in ["[", "]", ",", "'"]:
    df['IP address'] = df['IP address'].apply(lambda x: str(x).replace(ch, ""))

(Optional) For "Apache Open Office", displaying each IP address in new line is done by

df['IP address'] = df['IP address'].apply(lambda x: x.replace(" ", "\n"))

Finally save it.

df.to_excel("yourfile.xlsx", index=True)

(Appendix) For "Microsoft Office", displaying each IP address in new line is done by

  1. Skip (Optional) part above
  2. Choose column "IP address"
  3. Choose Home --> Wrap Text as the following red square

Choose Home-->Wrap Text

It's a bit lengthy, but it should do the task. Hope it helps.

Upvotes: 1

Related Questions