Reputation: 33
I am new to python and Pandas and I was looking for this answer for the past few weeks, but I hit a brick wall. Although, I normally find my answers in here or somewhere else on the net - I wasn't lucky this time.
Problem:
I have an excel file which I need to loop through to create a script for a firewall.
Column A, row 1,2,3 are merged and contains one Hostname Column B, row 1,2,3 are not merged and each row contains one IP address each
How do I write a condition to say if column A has any merged rows then check and merge the same row's in column B?
import pandas as pd
df = pd.read_excel('file.xlsx',sheetname="sheet_name")
df1 = df['Unnamed: 2'].dropna().drop([6,7]) # Unanamed is the column ID then Delete empty cell's and delete row 6&7
df2 = df['Unnamed: 4'].dropna().drop([6,7]) # Unanamed is the column ID then Delete empty cell's and delete row 6&7
for a, b in zip(df1,df2):
# The condition I am looking for, please note the code below is not correct
if a is merged then merge b:
print('Hostname {} ip address {}'.format(a,b))
Thank you in advance.
Upvotes: 3
Views: 1526
Reputation: 2152
If you load only the input part (red square in your question) as df
, and if the column 'Host name'
is merged, it would look like this (df
)
Host name IP address
LAX_R 192.168.1.1
NaN 192.168.1.2
NaN 192.168.1.3
By the way, you can check if the column 'Host name'
is initially merged by checking if it contains NaN
value (Note: I'm assuming there should not be NaN
values if it is not initially merged):
if sum(df['Host name'].isnull()) > 0:
And then, by forward filling NaN,
df.ffill(inplace=True)
we get
Host name IP address
LAX_R 192.168.1.1
LAX_R 192.168.1.2
LAX_R 192.168.1.3
After that, IP address
of the same Host name
are aggregated to the same cell as a list
df = pd.DataFrame(df.groupby('Host name')['IP address'].apply(list))
At this point, df looks like this
IP address
Host name
LAX_R [192.168.1.1, 192.168.1.2, 192.168.1.3]
The next step is to remove some unnecessary characters
df.columns = ['IP address']
for ch in ["[", "]", ",", "'"]:
df['IP address'] = df['IP address'].apply(lambda x: str(x).replace(ch, ""))
(Optional) For "Apache Open Office", displaying each IP address in new line is done by
df['IP address'] = df['IP address'].apply(lambda x: x.replace(" ", "\n"))
Finally save it.
df.to_excel("yourfile.xlsx", index=True)
(Appendix) For "Microsoft Office", displaying each IP address in new line is done by
It's a bit lengthy, but it should do the task. Hope it helps.
Upvotes: 1