Reputation: 9

Wildcard in python dictionary

I am trying create a python dictionary to reference 'WHM1',2,3, 'HISPM1',2,3, etc. and other iterations to create a new column with a specific string for ex. White or Hispanic. Using regex seems like the right path but I am missing something here and refuse to hard code the whole thing in the dictionary.

I have tried several iterations of regex and regexdict :

d = regexdict({'W*':'White', 'H*':'Hispanic'})
eeoc_nac2_All_unpivot_df['Race'] = 
eeoc_nac2_All_unpivot_df['EEOC_Code'].map(d)

A new column will be created with 'White' or 'Hispanic' for each row based on what is in an existing column called 'EEOC_Code'.

Upvotes: 0

Answers (2)

tripleee

Reputation: 189377

Your regular expressions are wrong - you appear to be using glob syntax instead of proper regular expressions.

In regex, x* means "zero or more of x" and so both your regexes will trivially match the empty string. You apparently mean

d = regexdict({'^W':'White', '^H':'Hispanic'})

instead, where the regex anchor ^ matches beginning of string.

There are several third-party packages 1, 2, 3 named regexdict so you should probably point out which one you use. I can't tell whether the ^ is necessary here, or whether the regexes need to match the input completely (I have assumed a substring match is sufficient, as is usually the case in regex) because this sort of detail may well differ between implementations.

Upvotes: 1

Neb

Reputation: 2280

I'm not sure to have completely understood your problem. However, if all your labels have structure WHM... and HISP..., then you can simply check the first character:

for race in eeoc_nac2_All_unpivot_df['EEOC_Code']:
     if race.startswith('W'):
         eeoc_nac2_All_unpivot_df['Race'] = "White"
     else:
         eeoc_nac2_All_unpivot_df['Race'] = "Hispanic"

Note: it only works if what you have inside eeoc_nac2_All_unpivot_df['EEOC_Code'] is iterable.

Upvotes: 0

Wildcard in python dictionary

Answers (2)

Related Questions