Merging two excel files based on columns in dataframes VLookup style

Question

My goal is to merge columns from a csv file containing either a username/ipaddress or an email/ipaddress combination after querying full username/email information. This seems like a basic inner join but is not working

example row in original file:
username     email                  ip address
NaN           user@user.com           69.0.1.127
Wilco         NaN                     69.0.12.128

example row from query output to csv file that needs to be combined:

username     email              branch

roger1       user@user.com      Los Angelas
Wilco        user2@user.com     Montreal

Expected Output:

username       email              branch           ipaddress
roger1         user@user.com      Los Angelas      69.0.1.127
Wilco          user2@user.com     Montreal         69.0.12.128

First part of this program is to gather data from the ip address excel sheet that has either username or email filled out, but not both... All of this is to combine username email and ipaddress at the end from this sheet and the sheet created.

directory variable = dirname

all_data = pd.DataFrame()

emailList=[]
userList = []
print "Dirname is " + dirname
#loop over excel sheets and create email and user strings to use for query
for f in glob.glob(dirname+"/Book1.xlsx"):
    df=pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

    all_data = all_data.append(df,ignore_index=True)
    emailList = all_data.dropna(subset=["email"])
    userList = all_data.dropna(subset=["user"])

    userList = userList["user"].tolist()
    emailList = emailList["email"].tolist()
    userList = map(int,userList)
    userList = ','.join(map(str,userList))
    emailList = "', '".join(map(str, emailList))

    emailList = "'" + emailList.upper() + "'"

all_data['email'] = all_data['email'].str.upper()

Query building logic here <>

Converted email to upper and other syntactic for an sql query for next step

con=cx_Oracle.connect("*************")
print "connection successful"
df_ora = pd.read_sql(queryStringEmail,con)
df_ora2 = pd.read_sql(queryStringUserList, con)
frames = [df_ora,df_ora2]
con.close()
newtable = pd.concat(frames)

This gives me a list of the users I need from the first excel sheet in the program.. all other rows can be ignored except the usernames/emails output in this dataframe.

nt = newtable.drop_duplicates(keep='last')

nt.to_csv("newcsv.csv", index=False, encoding='utf-8')
print "Operation successful"

Drop duplicates from second csv file as multiple of the same username/email/branch combos are useless

Lots of spaghetti code here, I apologize, but this is where the logic gets fuzzy

for f in glob.glob(dirname+"/newcsv.csv"):
    aa=pd.read_csv(f)
all_data.to_csv("newcsvALLDATA.csv", index=False, encoding='utf-8')
aa.to_csv("newcsvALLDATA2.csv", index=False, encoding='utf-8')

rename columns in the new csv file to match up with columns in original ip address csv (really just email and user)

aa.columns = ['user','email','first name','last name', 'branch', 'location']
print all_data
print aa

This is where I have tried all combinations and failed:

all_data = 

all_data.merge(aa,left_index=True,right_index=True,left_on="IP",how='inner')

print all_data
all_data.to_csv("newcsv2.csv", index=False, encoding='utf-8')
aa.to_excel(writer, sheet_name = 'x2')

I cannot print the data at the moment as I don't have access to db but can at a later point if someone has any idea

Again the goal is to output a csv file with the username/email combos matched to ipaddress csv that has either an email or username listed but not both

Merging two excel files based on columns in dataframes VLookup style

Answers (1)

Related Questions