Nabih Ibrahim Bawazir
Nabih Ibrahim Bawazir

Reputation: 641

How to solve ms excel readable comma corupt file in csv file in pandas (works macro code provided)?

I have comma corrupt csv file that can be read by microsoft excel file, but can't be read well using pandas dataframe, I got macro solution, but I want solution that works in python as well, here's first 5 lines of my data

mathematicians,occupation,country of citizenship,place of birth,date of death,educated at,employer,place of death,member of,employer,doctoral advisor,"languages spoken, written or signed",academic degree,doctoral student,manner of death,position held,field of work,award received,Erdős number,instance of,sex or gender,approx. date of birth,day of birth,month of birth,year of birth,approx. date of death,day of death,month of death,year of death
Roger Joseph Boscovich,"['physicist', 'astronomer', 'mathematician', 'philosopher', 'diplomat', 'poet', 'theologian', 'priest', 'polymath', 'historian', 'scientist', 'writer', 'cleric', 'university teacher']",['Republic of Ragusa'],"Dubrovnik, Republic of Ragusa",13 February 1787,['Pontifical Gregorian University'],['Pontifical Gregorian University'],"['Milan', 'Habsburg Empire']","['Royal Society', 'Russian Academy of Sciences', 'Russian Academy of Sciences']",['Pontifical Gregorian University'],,['Latin'],,,,,,['Fellow of the Royal Society'],,['human'],['male'],False,18,May,1711,False,13,February,1787
Emma Previato,['mathematician'],"['United States of America', 'Italy']",Badia Polesine,,"['Harvard University', 'University of Padua']","['Boston University', 'University of Padua']",,['American Mathematical Society'],"['Boston University', 'University of Padua']",['David Mumford'],,,,,,,,,['human'],['female'],False,,,1952,False,,,
Feodor Deahna,['mathematician'],,,1844,,,,,,,,,,,,['differential geometry'],,,['human'],['male'],False,,,1815,False,,,1844
Denis Henrion,"['publisher', 'mathematician']",['France'],,1640,,,,,,,['French'],,,,,,,,['human'],['male'],True,,,1500,False,,,1640

Here's working macro solutoion

For i1 = 1 To 9000
dump = ""
For i2 = 1 To 29
dump = dump & "," & Cells(i1, i2).Value
Next i2
Cells(i1, 30).Value = Mid(dump, 2, 1000)
Next i1

How to convert macro solution to pandas solution?

Upvotes: 0

Views: 91

Answers (1)

Martin Evans
Martin Evans

Reputation: 46759

To convert the list representations used in some of the cells:

import pandas as pd    
import csv 
import ast

data = []

with open('input.csv', 'rb') as f_input:
    for row in csv.reader(f_input):
        for index, v in enumerate(row):
            if v.startswith('['):
                row[index] = ', '.join(ast.literal_eval(v))
        data.append(row)

print pd.DataFrame(data)

For example a cell containing the text:

['United States of America', 'Italy']

Would become:

United States of America, Italy

The dataframe would appear as:

                       0                                                  1                                2                              3                 4                                        5                                       6                       7                                                  8                                       9                 10                                   11               12                13               14             15                     16                           17             18           19             20                     21            22              23             24                     25            26              27             28
0          mathematicians                                         occupation           country of citizenship                 place of birth     date of death                              educated at                                employer          place of death                                          member of                                employer  doctoral advisor  languages spoken, written or signed  academic degree  doctoral student  manner of death  position held          field of work               award received  Erdős number  instance of  sex or gender  approx. date of birth  day of birth  month of birth  year of birth  approx. date of death  day of death  month of death  year of death
1  Roger Joseph Boscovich  physicist, astronomer, mathematician, philosop...               Republic of Ragusa  Dubrovnik, Republic of Ragusa  13 February 1787          Pontifical Gregorian University         Pontifical Gregorian University  Milan, Habsburg Empire  Royal Society, Russian Academy of Sciences, Ru...         Pontifical Gregorian University                                                  Latin                                                                                            Fellow of the Royal Society                       human           male                  False            18             May           1711                  False            13        February           1787
2           Emma Previato                                      mathematician  United States of America, Italy                 Badia Polesine                    Harvard University, University of Padua  Boston University, University of Padua                                              American Mathematical Society  Boston University, University of Padua     David Mumford                                                                                                                                                                                   human         female                  False                                         1952                  False                                             
3           Feodor Deahna                                      mathematician                                                                              1844                                                                                                                                                                                                                                                                                                                                differential geometry                                                    human           male                  False                                         1815                  False                                         1844
4           Denis Henrion                           publisher, mathematician                           France                                             1640                                                                                                                                                                                                                                                     French                                                                                                                                              human           male                   True                                         1500                  False                                         1640

Upvotes: 1

Related Questions