Reputation: 1792
I have the following csv:
Name1 Name2
JSMITH J Smith
ASMITH A Smith
How can I read it into a dictionary so that the output is
dict = {'JSMITH':'J Smith', 'ASMITH': 'A Smith'}
I have used:
df= pd.read_csv('data.csv')
data_dict = df.to_dict(orient='list')
but it gives me
{'Name1': ['JSMITH','ASMITH'],'Name2': ['J Smith', 'A Smith']}
I am then hoping to use it in a map
function in pandas
such as:
df2['Name'] = df2['Name'].replace(data_dict, regex=True)
Any help would be much appreciated!
Upvotes: 4
Views: 1028
Reputation: 29742
Trick if you always have only two columns:
dict(df.itertuples(False,None))
Or make it a pandas.Series
and use to_dict
:
df.set_index("Name1")["Name2"].to_dict()
Output:
{'ASMITH': 'A Smith', 'JSMITH': 'J Smith'}
Note that if you need a mapper to a pd.Series.replace
, Series
works just as fine as a dict
.
s = df.set_index("Name1")["Name2"]
df["Name1"].replace(s, regex=True)
0 J Smith
1 A Smith
Name: Name1, dtype: object
Which also means that you can remove to_dict
and cut some overhead:
large_df = df.sample(n=100000, replace=True)
%timeit large_df.set_index("Name1")["Name2"]
# 4.76 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit large_df.set_index("Name1")["Name2"].to_dict()
# 20.2 ms ± 976 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 3