Reputation: 51
I have two dataframes as shown below, and I'm trying to set value to CLASSIFICATION in dataframe1 based on dataframe2's ITEM/CODE. If DESC has any matching word to ITEM/CODE then I have to get TYPE from dataframe2.
To achieve this, I am splitting the DESC string, and trying to compare the list entries with ITEM/CODE of dataframe2. Any Ideas on how to do this?
dataframe1
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec
C23891 Powerswitch
C23892 Resistor 7.5K
C23893 Resistor .1K
C23894 FET elec
C23895 ELE SD Card adapter
C23896 Crystal 16Mhz
C23897 Capacitor 100uF
C23898 ELECTRONICS Resistor 10K
C23899 M3x5 Socket Cap Bolt MECH
C23900 M3x6 Socket Cap Bolt Mech
C23901 Mehcanical Assemble Kapton Tape 120mm
C23902 MK7 Filament Drive Block Front
C23903 Pulley 5mm shaft
dataframe2
ITEM/CODE TYPE
ELE ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor ELECTRONIC
Resistor ELECTRONIC
Washer MECHANICAL
MECH MECHANICAL
This is what I have coded up so far.
import pandas as pd
fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')
fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')
dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)
Result
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec ELECTRONIC
C23891 Powerswitch ELECTRONIC
C23892 Resistor 7.5K ELECTRONIC
C23893 Resistor .1K ELECTRONIC
C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
Upvotes: 0
Views: 50
Reputation: 3011
Not as fancy, but should do the job:
import pandas as pd
#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()
#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
out_str = np.NaN
for val in in_str.split():
if val in type_dict.keys():
out_str = type_dict[val]['TYPE']
return out_str
#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)
Upvotes: 1
Reputation: 863226
Use Series.str.contains
with looping by Series created by dataframe2
, flags=re.I
parameter is for non case matching:
import re
for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
#if necessary word boundaries
pat = r"\b{}\b".format(k)
#if not
#pat = k
dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
10 C23900 M3x6 Socket Cap Bolt Mech MECHANICAL
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
If want match only first word use Series.map
, but first convert both values to lowercase by Series.str.lower
:
dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']
dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH NaN
10 C23900 M3x6 Socket Cap Bolt Mech NaN
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
Upvotes: 1