Reputation: 8177
I have two DataFrames:
df_components
: list of unique components (ID, DESCRIPTION) dataset
: several rows and columns from a CSV (one of these columns contains the description of a component).I need to create a new column in the dataset
with the ID of the component according to the df_components
.
I tried to do this way:
Creating the df_components
and the ID
column based on the index
components = dataset["COMPDESC"].unique()
df_components = pd.DataFrame(components, columns=['DESCRIPTION'])
df_components.sort_values(by='DESCRIPTION', ascending=True, inplace=True)
df_components.reset_index(drop=True, inplace=True)
df_components.index += 1
df_components['ID'] = df_components.index
Sample output:
DESCRIPTION ID
1 AIR BAGS 1
2 AIR BAGS:FRONTAL 2
3 AIR BAGS:FRONTAL:SENSOR/CONTROL MODULE 3
4 AIR BAGS:SIDE/WINDOW 4
Create the COMP_ID
in the dataset:
def create_component_id_column(row):
found = df_components[df_components['DESCRIPTION'] == row['COMPDESC']]
return found.ID if len(found.index) > 0 else None
dataset['COMP_ID'] = dataset.apply(lambda row: create_component_id_column(row), axis=1)
However this gives me the error ValueError: Wrong number of items passed 248, placement implies 1
. Being 248 the number of items on df_components
.
How can I create this new column with the ID from the item found on df_components
?
Upvotes: 2
Views: 104
Reputation: 164613
Your logic seems overcomplicated. Since you are currently creating df_components
from dataset
, a better idea would be to use Categorical Data with dataset
. This means you do not need to create df_components
.
Step 1
Convert dataset['COMPDESC']
to categorical.
dataset['COMPDESC'] = dataset['COMPDESC'].astype('category')
Step 2
Create ID from categorical codes. Since categories are alphabetically sorted by default and indexing starts from 0, add 1 to the codes.
dataset['ID'] = dataset['COMPDESC'].cat.codes + 1
If you wish, you can extract the entire categorical mapping to a dictionary:
cat_map = dict(enumerate(dataset['COMPDESC'].cat.categories))
Remember that there always be a 1-offset if you want your IDs to begin at 1. In addition, you will need to update 'ID' explicitly every time 'DESCRIPTION' changes.
Advantages of using categorical data
Upvotes: 2