Mapping CSV Header using a Dictionary

Question

I have a reference file that looks like this:

Experiment,Array,Drug
8983,Genechip,Famotidine
8878,Microarray,Dicyclomine
8988,Genechip,Etidronate
8981,Microarray,Flunarizine

I successfully created a dictionary mapping the Experiment numbers to the Drug name using the following:

reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
#Configure dictionary
result = {}
for row in reader:
    key = row[0]
    result[key] = row[2]
di = result

I want to map this dictionary to the header of another file which consists of the experiment number. It currently looks like this:

Gene,8988,8981,8878,8983
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011

But it should look like this:

Gene,Etidronate,Flunarizine,Dicyclomine,Famotidine
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011

I tried using:

import csv
import pandas as pd

reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
result = {}
    for row in reader:
    key = row[0]
    result[key] = row[2]
    di = result
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')
df['row[0]'].replace(di, inplace=True)

but it returned a KeyError: 'row[0]'.

I tried the following as well, even transposing in order to merge:

import pandas as pd

df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt',).transpose()
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', delimiter=',', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Drug').drop(['Experiment', 'Array'], axis=1)
df4.index.name = 'Drug'
print df4

and this time received MergeError('No common columns to perform merge on').

Is there a simpler way to map my dictionary to the header that would work?

Nickil Maveli · Accepted Answer

One of the things to keep in mind would be to making sure that both the keys corresponding to the mapper dictionary as well as the header which it is mapped to are of the same data type.

Here, one is a string and the other of integer type. So while reading itself, we'll let it not interpret dtype by setting it to str for the reference DF.

df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')               # Original
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', dtype=str)  # Reference

Convert the columns of the original DF to it's series representation and then replace the old value which were Experiment Nos. with the new Drug name retrieved from the reference DF.

df1.columns = df1.columns.to_series().replace(df2.set_index('Experiment').Drug)
df1

Mapping CSV Header using a Dictionary

Answers (2)

Related Questions