user5596139
user5596139

Reputation: 41

Python -> csv to Dictionary ->

I have this CSV (file):

uid,aabreo
objectClass,top
objectClass,inetOrgPerson
objectClass,UnabPerson
cn,Angela Abreo Garcia
sn,Abreo Garcia
administrativo,no
AdmPortales,no
AdmSisInformacion,no

uid,aabreo265
objectClass,top
objectClass,inetOrgPerson
objectClass,UnabPerson
cn,ANDRES FELIPE ABREO SERRANO
sn,ABREO SERRANO
administrativo,no

uid,aabreo602
objectClass,top
objectClass,inetOrgPerson
objectClass,UnabPerson
cn,ANDRES FELIPE ABREO SERRANO
sn,ABREO SERRANO
administrativo,no

uid,aabril
objectClass,top
objectClass,inetOrgPerson
objectClass,UnabPerson
cn,ALEYDA SMITH ABRIL RINCON
sn,ABRIL RINCON
administrativo,no

I want in another csv, the first column is headers , and another is value

there is my code


import csv
import pandas as pd

f= open(r"C:\Users\USER\Downloads\LDAP_1.csv",encoding="utf-8")
print (f.read())

datos = pd.read_csv(r"C:\Users\USER\Downloads\LDAP_1.csv",header=0)
#print(datos)

dict_data={}
with open(r"C:\Users\USER\Downloads\LDAP_1.csv",encoding="utf-8") as file:
    dict_data= dict(filter(None,csv.reader(file)))
    
print(dict_data)    

#print(dict_data.values())
#print(dict_data.keys())



#csv_columns=['uid','objectClass','objectClass','objectClass','cn','sn']

csv_file = "Names.csv"
try:
    with open(csv_file, 'w') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=dict_data.keys())
        writer.writeheader()
        for data in dict_data.values():
            writer.writerow(dict_data)
except IOError:
    print("I/O error")

But the result is take last value, and put space in blank, so i dont know what make wrong.

uid,objectClass,cn,sn,administrativo,AdmPortales,AdmSisInformacion

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

aabril,UnabPerson,ALEYDA SMITH ABRIL RINCON,ABRIL RINCON,no,no,no

Upvotes: 0

Views: 52

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148900

Your initial file is not a csv file. In a csv file, a record should be contained in one single row, while in you file a record only ends on an empty line. Using pandas csv to process it is close to using a hammer to drive a screw: if the hammer is heavy enough, the screw will end into the board yet it is not the right tool.

That means that you have a text file with a custom structure, so my opinion is that you should use a custom parser to build the records and then write those records to a true csv file directly with the csv module. You could of course use pandas here, but (still IMHO) the added value is not worth it.

But your problem is directly caused by the objectClass field to be multi-valued in the ldap database. You are trying to build a csv with duplicated column names which should be avoided, and use a dict for that which is not possible because a key has to be unique in a dict.

You have different ways to solve that:

  • concatenate the various objectClass value into a single field with a different separator. Easy to build, but slightly harder to decode

  • add something to have column names to be distinct, for example objectClass, objectClass1, objectClass2. Easy to process, but if you cannot know in advance the possible number of objectClass values for all the records, it will be harded to format your file

  • duplicate the records to write one record per objectClass value

      uuid, objectClass
      aabreo,top
      aabreo,InetOrgPerson
      aabrea,UnagPerson
    

Without knowing more of the way you want to use the final file, I cannot guess which way is better for you...

Upvotes: 1

Related Questions