Reputation: 21
My goal is to merge columns from a csv file containing either a username/ipaddress or an email/ipaddress combination after querying full username/email information. This seems like a basic inner join but is not working
example row in original file:
username email ip address
NaN [email protected] 69.0.1.127
Wilco NaN 69.0.12.128
example row from query output to csv file that needs to be combined:
username email branch
roger1 [email protected] Los Angelas
Wilco [email protected] Montreal
Expected Output:
username email branch ipaddress
roger1 [email protected] Los Angelas 69.0.1.127
Wilco [email protected] Montreal 69.0.12.128
First part of this program is to gather data from the ip address excel sheet that has either username or email filled out, but not both... All of this is to combine username email and ipaddress at the end from this sheet and the sheet created.
directory variable = dirname
all_data = pd.DataFrame()
emailList=[]
userList = []
print "Dirname is " + dirname
#loop over excel sheets and create email and user strings to use for query
for f in glob.glob(dirname+"/Book1.xlsx"):
df=pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
all_data = all_data.append(df,ignore_index=True)
emailList = all_data.dropna(subset=["email"])
userList = all_data.dropna(subset=["user"])
userList = userList["user"].tolist()
emailList = emailList["email"].tolist()
userList = map(int,userList)
userList = ','.join(map(str,userList))
emailList = "', '".join(map(str, emailList))
emailList = "'" + emailList.upper() + "'"
all_data['email'] = all_data['email'].str.upper()
Query building logic here <>
Converted email to upper and other syntactic for an sql query for next step
con=cx_Oracle.connect("*************")
print "connection successful"
df_ora = pd.read_sql(queryStringEmail,con)
df_ora2 = pd.read_sql(queryStringUserList, con)
frames = [df_ora,df_ora2]
con.close()
newtable = pd.concat(frames)
This gives me a list of the users I need from the first excel sheet in the program.. all other rows can be ignored except the usernames/emails output in this dataframe.
nt = newtable.drop_duplicates(keep='last')
nt.to_csv("newcsv.csv", index=False, encoding='utf-8')
print "Operation successful"
Drop duplicates from second csv file as multiple of the same username/email/branch combos are useless
Lots of spaghetti code here, I apologize, but this is where the logic gets fuzzy
for f in glob.glob(dirname+"/newcsv.csv"):
aa=pd.read_csv(f)
all_data.to_csv("newcsvALLDATA.csv", index=False, encoding='utf-8')
aa.to_csv("newcsvALLDATA2.csv", index=False, encoding='utf-8')
rename columns in the new csv file to match up with columns in original ip address csv (really just email and user)
aa.columns = ['user','email','first name','last name', 'branch', 'location']
print all_data
print aa
This is where I have tried all combinations and failed:
all_data =
all_data.merge(aa,left_index=True,right_index=True,left_on="IP",how='inner')
print all_data
all_data.to_csv("newcsv2.csv", index=False, encoding='utf-8')
aa.to_excel(writer, sheet_name = 'x2')
I cannot print the data at the moment as I don't have access to db but can at a later point if someone has any idea
Again the goal is to output a csv file with the username/email combos matched to ipaddress csv that has either an email or username listed but not both
Upvotes: 1
Views: 1070
Reputation: 21
Incase anyone runs into this issue in the future, I have solved it with the following code
aa.columns = ['user','email','first name','last name', 'provID', 'provName']
print aa
all_dataMerge = pd.merge(aa,all_data, on='user', suffixes=['1','2'])
The merge happens between both tables on 1 key but it handles both logically. Simple answer that took me forever to figure out. Hope this helps someone in the future.
#Logic to combine the dataframes into final form, similar to VLookup
data_final = all_dataMerge.drop_duplicates(keep='first')
Upvotes: 1