brSoccer
brSoccer

Reputation: 1

Filtering dict to dataframe

I have problem with filtering dict to dataframe

I have dataframe:

location recipient material type colour
store bottle ZN_PLASTIC bin red
store bottle ZN_PLASTIC_GR bin red
store bottle ZN_PLASTIC_BL bin red
store bottle ZN_PLASTIC_WH bin red
store bottle ZN_PLASTIC_TP bin red
store bottle ZN_GLASS bin green
store bottle ZN_GLASS_GR bin green
store bottle ZN_GLASS_BL bin green
store bottle ZN_GLASS_WR bin green
store bottle ZN_GLASS_TP bin green

Create dataframes by category of material:

plastic = data.loc[data['material'].str.contains('PLASTIC') == True]              
glass = data.loc[data['material'].str.contains('GLASS') == True]

Create a dict for types of plastic:

plastic_dict = {}
for klass in plastic['material'].unique():
    plastic_dict[klass] = plastic[plastic['material'].str.contains(klass)]

Display:

plastic_dict.keys()

Output:

dict_keys(['ZN_PLASTIC', 'ZN_PLASTIC_GR', 'ZN_PLASTIC_BL', 'ZN_PLASTIC_WH', 'ZN_PLASTIC_TP'])

Create a dict for types of glass:

glass_dict = {}
for klass in glass['material'].unique():
    glass_dict[klass] = glass[glass['material'].str.contains(klass)]

Display:

glass_dict.keys()

Output:

dict_keys(['ZN_GLASS', 'ZN_GLASS_GR', 'ZN_GLASS_BL', 'ZN_GLASS_WH', 'ZN_GLASS_TP'])

Now, I'm trying to filter some data using the dict and create a dataframe:

ac_plastic_ = {}
for i in plastic_dict.keys():
    locals()[f"ac_plastic_{i}"] = plastic_dict[i]
    locals()[f"ac_plastic_{i}"].to_csv (r'ac_plastic_' + str(i) + '.txt', index = None, header=False, sep='\t', encoding='utf-8')

But the filter fail and I have the following:

display(ac_plastic_ZN_PLASTIC)

Output:

location recipient material type colour
store bottle ZN_PLASTIC bin red
store bottle ZN_PLASTIC_GR bin red
store bottle ZN_PLASTIC_BL bin red
store bottle ZN_PLASTIC_WH bin red
store bottle ZN_PLASTIC_TP bin red

For the more specific sentence the filter works:

display(ac_plastic_ZN_PLASTIC_GR)

Output:

location recipient material type colour
store bottle ZN_PLASTIC_GR bin red

I have tried to fix unsuccessfully. So, how to solve this problem?

Thanks

Upvotes: 0

Views: 58

Answers (1)

HeytalePazguato
HeytalePazguato

Reputation: 380

The issue is because you are using pandas.Series.str.contains when you create the dictionary:

glass_dict = {}
for klass in glass['material'].unique():
    glass_dict[klass] = glass[glass['material'].str.contains(klass)]

Therefore you obtain all rows that contain ZN_PLASTIC in the string, what you need to do is match the exact klass:

plastic_dict = {}
for klass in plastic['material'].unique():
    plastic_dict[klass] = plastic.loc[plastic['material'] == klass]

Note: Always use loc or iloc to select the rows when you create partial dataframes, otherwise later you might get some errors related to slices or copies when trying to apply other functions.

Upvotes: 1

Related Questions