Reputation: 573
I have been trying this for a while and I am stuck. Here it is the problem:
I am working with some metadata about texts that I have in CSV files. It looks like this:
The real table is longer and more complex, but it follows the same logic: every row is a text and every column is different aspects of the text. I have in some of the columns to much variation and I want it to remodel in a simpler one. For example changing from the narrative-perspective the values of homodiegetic and autodiegetic to non-heterodiegetic. I define this new model in another CSV file called keywords that looks like this:
As you can see, every column of the metadata becomes a row in the new model-keywords, where the old value is in the term_value column and the new value is in the new_model column.
So I need to map or replace this values using Pandas. This is what I have got for now:
import re
import pandas as pd
df_metadata = pd.read_csv("/metadata.csv", encoding="utf-8", sep=",")
df_keywords = pd.read_csv("/keywords.csv", encoding="utf-8", sep="\t")
for column_metadata,value_metadata in df_metadata.iteritems():
if str(column_metadata) in list(df_keywords.loc[:,"term_type"]):
df_metadata.loc[df_metadata[column_metadata] == value_metadata, column_metadata] = df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]
And Python always gives this error back:
"ValueError: Series lengths must match to compare"
I think the problem is in the value_metadata of the second part of the replace with loc, I mean here:
df_keywords.loc[df_keywords["term_value"] == value_metadata, ["new_model"]]
The thing I don't understand is why value_metadata works in the first part of this command but it doesn't in the second one...
Please, I would appreciate any help. Maybe there is a simpler way than iterate through the dataframes... I am very open to any suggestion. Best regards, José
Upvotes: 1
Views: 267
Reputation: 862751
You can first create Multiindex
in df_keywords
for faster selecting by slicers and in loop map
new value by old one:
df_keywords.set_index(['term_type','term_value'], inplace=True)
idx = pd.IndexSlice
#first maping in column narrative-perspective
print (df_keywords.loc[idx['narrative-perspective',:]].to_dict()['new_model'])
{'heterodiegetic': 'heterodiegetic', 'other/mixed': 'n-heterodiegetic',
'homodiegetic': 'n-heterodiegetic', 'autodiegetic': 'n-heterodiegetic'}
#column names for replacing
L = ['narrative-perspective','narrator','protagonist-gender']
for col in L:
df_metadata[col] =
df_metadata[col].map(df_keywords.loc[idx[col,:]].to_dict()['new_model'])
print (df_metadata)
idno author-name narrative-perspective narrator protagonist-gender
0 ne0001 Baroja n-heterodiegetic third-person male
1 ne0002 Galdos heterodiegetic third-person n-male
2 ne0003 Galdos n-heterodiegetic third-person male
3 ne0004 Galdos n-heterodiegetic third-person n-male
4 ne0005 Galdos heterodiegetic third-person n-male
5 ne0006 Galdos heterodiegetic third-person male
6 ne0007 Sawa heterodiegetic third-person n-male
7 ne0008 Zamacois heterodiegetic third-person n-male
8 ne0009 Galdos heterodiegetic third-person n-male
9 ne0011 Galdos n-heterodiegetic n-third-person male
Also to_dict
can be omited and then map by Series
:
df_keywords.set_index(['term_type','term_value'], inplace=True)
idx = pd.IndexSlice
#first maping in column narrative-perspective
print (df_keywords.loc[idx['narrative-perspective',:]]['new_model'])
term_value
autodiegetic n-heterodiegetic
heterodiegetic heterodiegetic
homodiegetic n-heterodiegetic
other/mixed n-heterodiegetic
Name: new_model, dtype: object
L = ['narrative-perspective','narrator','protagonist-gender']
for col in L:
df_metadata[col] = df_metadata[col].map(df_keywords.loc[idx[col,:]]['new_model'])
print (df_metadata)
idno author-name narrative-perspective narrator protagonist-gender
0 ne0001 Baroja n-heterodiegetic third-person male
1 ne0002 Galdos heterodiegetic third-person n-male
2 ne0003 Galdos n-heterodiegetic third-person male
3 ne0004 Galdos n-heterodiegetic third-person n-male
4 ne0005 Galdos heterodiegetic third-person n-male
5 ne0006 Galdos heterodiegetic third-person male
6 ne0007 Sawa heterodiegetic third-person n-male
7 ne0008 Zamacois heterodiegetic third-person n-male
8 ne0009 Galdos heterodiegetic third-person n-male
9 ne0011 Galdos n-heterodiegetic n-third-person male
Upvotes: 1