Mansoor Akram
Mansoor Akram

Reputation: 2056

Reformat csv file

I have this csv file with only two entries. Here it is:

Meat One,['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']

First one is title and second is a business headings.

Problem lies with entry two.

Here is my code:

import csv

with open('phonebookCOMPK-Directory.csv', "rt") as textfile:
    reader = csv.reader(textfile)

    for row in reader:
        row5 = row[5].replace("[", "").replace("]", "")
        listt = [(''.join(row5))]
        print (listt[0])

it prints:

'Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers'

What i need to do is that i want to create a list containing these words and then print them like this using for loop to print every item separately:

Abattoirs
Exporters
Food Delivery
Butchers Retail
Meat Dealers-Retail
Meat Freezer
Meat Packers

Actually I am trying to reformat my current csv file and clean it so it can be more precise and understandable.

Complete 1st line of csv is this:

Meat One,+92-21-111163281,Al Shaheer Corporation,Retailers,2008,"['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']","[[' Outlets Address : Shop No. Z-10, Station Shopping Complex, MES Market, Malir-Cantt, Karachi. Landmarks : MES Market, Station Shopping Complex City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi. Landmarks : Nadra Chowrangi, Sky Garden, Tipu Sultan Road City : Karachi UAN : +92-21-111163281 '], ["" Outlets Address : Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi. Landmarks : Boat Basin, Jans Broast, Khayaban-e-Roomi City : Karachi UAN : +92-21-111163281 View Map ""], [' Outlets Address : Gulistan-e-Johar, Karachi. Landmarks : Perfume Chowk City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Tee Emm Mart, Creek Vista Appartments, Khayaban-e-Shaheen, Phase VIII, DHA, Karachi. Landmarks : Creek Vista Appartments, Nueplex Cinema, Tee Emm Mart, The Place City : Karachi Mobile : 0302-8333666 '], [' Outlets Address : Y-Block, DHA, Lahore. Landmarks : Y-Block City : Lahore UAN : +92-42-111163281 '], [' Outlets Address : Adj. PSO, Main Bhittai Road, Jinnah Supermarket, F-7 Markaz, Islamabad. Landmarks : Bhittai Road, Jinnah Super Market, PSO Petrol Pump City : Islamabad UAN : +92-51-111163281 ']]","Agriculture, fishing & Forestry > Farming equipment & services > Abattoirs in Pakistan"

First column is Name
Second column is Number
Third column is Owner
Forth column is Business type
Fifth column is Y.O.E
Sixth column is Business Headings
Seventh column is Outlets (List of lists containing every branch address)
Eighth column is classification

There is no restriction of using csv.reader, I am open to any technique available to clean my file.

Upvotes: 0

Views: 1935

Answers (2)

Anton Strogonoff
Anton Strogonoff

Reputation: 34032

Think of it in terms of two separate tasks:

  • Collect some data items from a ‘dirty’ source (this CSV file)
  • Store that data somewhere so that it’s easy to access and manipulate programmatically (according to what you want to do with it)

Processing dirty CSV

One way to do this is to have a function deserialize_business() to distill structured business information from each incoming line in your CSV. This function can be complex because that’s the nature of the task, but still it’s advisable to split it into self-containing smaller functions (such as get_outlets(), get_headings(), and so on). This function can return a dictionary but depending on what you want it can be a [named] tuple, a custom object, etc.

This function would be an ‘adapter’ for this particular CSV data source.

Example of deserialization function:

def deserialize_business(csv_line):
    """
    Distills structured business information from given raw CSV line.
    Returns a dictionary like {name, phone, owner,
    btype, yoe, headings[], outlets[], category}.
    """

    pieces = [piece.strip("[[\"\']] ") for piece in line.strip().split(',')]

    name = pieces[0]
    phone = pieces[1]
    owner = pieces[2]
    btype = pieces[3]
    yoe = pieces[4]

    # after yoe headings begin, until substring Outlets Address
    headings = pieces[4:pieces.index("Outlets Address")]

    # outlets go from substring Outlets Address until category
    outlet_pieces = pieces[pieces.index("Outlets Address"):-1]

    # combine each individual outlet information into a string
    # and let ``deserialize_outlet()`` deal with that
    raw_outlets = ', '.join(outlet_pieces).split("Outlets Address")
    outlets = [deserialize_outlet(outlet) for outlet in raw_outlets]

    # category is the last piece
    category = pieces[-1]

    return {
        'name': name,
        'phone': phone,
        'owner': owner,
        'btype': btype,
        'yoe': yoe,
        'headings': headings,
        'outlets': outlets,
        'category': category,
    }

Example of calling it:

with open("phonebookCOMPK-Directory.csv") as f:
    lineno = 0

    for line in f:
        lineno += 1

        try:
            business = deserialize_business(line)

        except:
            # Bad line formatting?
            log.exception(u"Failed to deserialize line #%s!", lineno)

        else:
            # All is well
            store_business(business)

Storing the data

You’ll have the store_business() function take your data structure and write it somewhere. Maybe it’ll be another CSV that’s better structured, maybe multiple CSVs, a JSON file, or you can make use of SQLite relational database facilities since Python has it built-in.

It all depends on what you want to do later.

Relational example

In this case your data would be split across multiple tables. (I’m using the word “table” but it can be a CSV file, although you can as well make use of an SQLite DB since Python has that built-in.)

Table identifying all possible business headings:

business heading ID, name
1, Abattoirs
2, Exporters
3, Food Delivery
4, Butchers Retail
5, Meat Dealers-Retail
6, Meat Freezer
7, Meat Packers

Table identifying all possible categories:

category ID, parent category, name
1, NULL, "Agriculture, fishing & Forestry"
2, 1, "Farming equipment & services"
3, 2, "Abattoirs in Pakistan"

Table identifying businesses:

business ID, name, phone, owner, type, yoe, category
1, Meat One, +92-21-111163281, Al Shaheer Corporation, Retailers, 2008, 3

Table describing their outlets:

business ID, city, address, landmarks, phone
1, Karachi UAN, "Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi", "Nadra Chowrangi, Sky Garden, Tipu Sultan Road", +92-21-111163281
1, Karachi UAN, "Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi", "Boat Basin, Jans Broast, Khayaban-e-Roomi", +92-21-111163281

Table describing their headings:

business ID, business heading ID
1, 1
1, 2
1, 3
…

Handling all this would require a complex store_business() function. It may be worth looking into SQLite and some ORM framework, if going with relational way of keeping the data.

Upvotes: 1

nino_701
nino_701

Reputation: 692

You can just replace the line :

print(listt[0])

with :

print(*listt[0], sep='\n')

Upvotes: 0

Related Questions