shantanuo
shantanuo

Reputation: 32316

Rearrange CSV data

I have 2 csv files with different sequence of columns. For e.g. the first file starts with 10 digits mobile numbers while that column is at number 4 in the second file. I need to merge all the customer data into a single csv file. The order of the columns should be as follows:

mobile pincode model Name Address Location pincode date

mobile  Name    Address Model   Location    pincode Date
9845299999  Raj Shah    nagar No 22 Rivi Building 7Th Main I    Crz Mumbai      17/02/2011
9880877777  Managing Partner M/S Aitas  # 1010, 124Th Main, Bk Stage.  -    Bmw 320 D   Hyderabad   560070  30-Dec-11


Name    Address Location    mobile  pincode Date    Model   
Asvi Developers pvt Ltd fantry Road Nariman Point, 1St Floor, No. 150   Chennai 9844066666      13/11/2011  Crz 
L R Shiva Gaikwad & Sudha Gaikwad   # 42, Suvarna Mansion, 1St Cross, 17Th Main, Banjara Hill, B S K Stage,-    Bangalore   9844233333  560085  40859   Mercedes_E 350 Cdi  

Second task and that may be slightly difficult is that the new files expected may have a totally different column sequence. In that case I need to extract 10 digits mobile number and 6 digits pincode column. I need to write the code that will guess the city column if it matches with any of the given city list. The new files are expected to have relevant column headings but the column heading may be slightly different. for e.g. "customer address" instead of "address". How do I handle such data?

sed 's/.*\([0-9]\{10\}\).*/\1,&/' input

I have been suggested to use sed to rearrange the 10 digits column at the beginning. But I do also need to rearrange the text columns. For e.g. if a column matches the entries in the following list then it is undoubtedly model column.

['Crz', 'Bmw 320 D', 'Benz', 'Mercedes_E 350 Cdi', 'Toyota_Corolla He 1.8']

If any column matches 10% of the entries with the above list then it is a "model" column and should be at number 3 followed by mobile and pincode.

Upvotes: 2

Views: 749

Answers (1)

Happy001
Happy001

Reputation: 6383

For your first question, I suggest using pandas to load both files and then concat. After that you can rearrange your columns.

import pandas as pd
dataframe1 = pd.read_csv('file1.csv')
dataframe2 = pd.read_csv('file2.csv')
combined = pd.concat([dataframe1, dataframe2]) #the columns will be ordered alphabetically

To get desired order,

result_df = combined[['mobile', 'pincode', 'model', 'Name', 'Address', 'Location', 'pincode', 'date']]

and then result_df.to_csv('oupput.csv', index=False) to export to csv file.

For the second one, you can do something like this (assuming you have loaded a csv file into df like above)

match_model = lambda m: m in ['Crz', 'Bmw 320 D', 'Benz', 'Mercedes_E 350 Cdi',   'Toyota_Corolla He 1.8']

for c in df:
    if df[c].map(match_model).sum()/len(df) > 0.1:
        print "Column %s is 'Model'"% c
        df.rename(columns={c:'Model'}, inplace=True)

You can modify the matching function match_model to use regex instead if you want.

Upvotes: 1

Related Questions