TroyPGoff
TroyPGoff

Reputation: 45

Mapping CSV Header using a Dictionary

I have a reference file that looks like this:

Experiment,Array,Drug
8983,Genechip,Famotidine
8878,Microarray,Dicyclomine
8988,Genechip,Etidronate
8981,Microarray,Flunarizine

I successfully created a dictionary mapping the Experiment numbers to the Drug name using the following:

reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
#Configure dictionary
result = {}
for row in reader:
    key = row[0]
    result[key] = row[2]
di = result

I want to map this dictionary to the header of another file which consists of the experiment number. It currently looks like this:

Gene,8988,8981,8878,8983
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011

But it should look like this:

Gene,Etidronate,Flunarizine,Dicyclomine,Famotidine
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011

I tried using:

import csv
import pandas as pd

reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
result = {}
    for row in reader:
    key = row[0]
    result[key] = row[2]
    di = result
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')
df['row[0]'].replace(di, inplace=True)

but it returned a KeyError: 'row[0]'.

I tried the following as well, even transposing in order to merge:

import pandas as pd

df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt',).transpose()
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', delimiter=',', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Drug').drop(['Experiment', 'Array'], axis=1)
df4.index.name = 'Drug'
print df4

and this time received MergeError('No common columns to perform merge on').

Is there a simpler way to map my dictionary to the header that would work?

Upvotes: 2

Views: 2168

Answers (2)

Navidad20
Navidad20

Reputation: 832

I used csv for the whole script. This fixes the header you wanted and saves into a new file. The new filename can be replaced with the same one if that's what you prefer. This program is written with python3.

import csv 

with open('sample.txt', 'r') as ref:
    reader = csv.reader(ref)
    # skip header line
    next(reader)
    # make dictionary
    di = dict([(row[0], row[2]) for row in reader])

data = []
with open('sample1.txt', 'r') as df:
    reader = csv.reader(df)
    header = next(reader)
    new_header = [header[0]] + [di[i] for i in header if i in di]
    data = list(reader)

# used to make new file, can also replace with the same file name
with open('new_sample1.txt', 'w') as df_new:
    writer = csv.writer(df_new)
    writer.writerow(new_header)
    writer.writerows(data)

Upvotes: 1

Nickil Maveli
Nickil Maveli

Reputation: 29711

One of the things to keep in mind would be to making sure that both the keys corresponding to the mapper dictionary as well as the header which it is mapped to are of the same data type.

Here, one is a string and the other of integer type. So while reading itself, we'll let it not interpret dtype by setting it to str for the reference DF.

df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')               # Original
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', dtype=str)  # Reference

Convert the columns of the original DF to it's series representation and then replace the old value which were Experiment Nos. with the new Drug name retrieved from the reference DF.

df1.columns = df1.columns.to_series().replace(df2.set_index('Experiment').Drug)
df1

enter image description here

Upvotes: 2

Related Questions