Reputation: 1
I have problem with filtering dict to dataframe
I have dataframe:
location | recipient | material | type | colour |
---|---|---|---|---|
store | bottle | ZN_PLASTIC | bin | red |
store | bottle | ZN_PLASTIC_GR | bin | red |
store | bottle | ZN_PLASTIC_BL | bin | red |
store | bottle | ZN_PLASTIC_WH | bin | red |
store | bottle | ZN_PLASTIC_TP | bin | red |
store | bottle | ZN_GLASS | bin | green |
store | bottle | ZN_GLASS_GR | bin | green |
store | bottle | ZN_GLASS_BL | bin | green |
store | bottle | ZN_GLASS_WR | bin | green |
store | bottle | ZN_GLASS_TP | bin | green |
Create dataframes by category of material:
plastic = data.loc[data['material'].str.contains('PLASTIC') == True]
glass = data.loc[data['material'].str.contains('GLASS') == True]
Create a dict for types of plastic:
plastic_dict = {}
for klass in plastic['material'].unique():
plastic_dict[klass] = plastic[plastic['material'].str.contains(klass)]
Display:
plastic_dict.keys()
Output:
dict_keys(['ZN_PLASTIC', 'ZN_PLASTIC_GR', 'ZN_PLASTIC_BL', 'ZN_PLASTIC_WH', 'ZN_PLASTIC_TP'])
Create a dict for types of glass:
glass_dict = {}
for klass in glass['material'].unique():
glass_dict[klass] = glass[glass['material'].str.contains(klass)]
Display:
glass_dict.keys()
Output:
dict_keys(['ZN_GLASS', 'ZN_GLASS_GR', 'ZN_GLASS_BL', 'ZN_GLASS_WH', 'ZN_GLASS_TP'])
Now, I'm trying to filter some data using the dict and create a dataframe:
ac_plastic_ = {}
for i in plastic_dict.keys():
locals()[f"ac_plastic_{i}"] = plastic_dict[i]
locals()[f"ac_plastic_{i}"].to_csv (r'ac_plastic_' + str(i) + '.txt', index = None, header=False, sep='\t', encoding='utf-8')
But the filter fail and I have the following:
display(ac_plastic_ZN_PLASTIC)
Output:
location | recipient | material | type | colour |
---|---|---|---|---|
store | bottle | ZN_PLASTIC | bin | red |
store | bottle | ZN_PLASTIC_GR | bin | red |
store | bottle | ZN_PLASTIC_BL | bin | red |
store | bottle | ZN_PLASTIC_WH | bin | red |
store | bottle | ZN_PLASTIC_TP | bin | red |
For the more specific sentence the filter works:
display(ac_plastic_ZN_PLASTIC_GR)
Output:
location | recipient | material | type | colour |
---|---|---|---|---|
store | bottle | ZN_PLASTIC_GR | bin | red |
I have tried to fix unsuccessfully. So, how to solve this problem?
Thanks
Upvotes: 0
Views: 58
Reputation: 380
The issue is because you are using pandas.Series.str.contains
when you create the dictionary:
glass_dict = {}
for klass in glass['material'].unique():
glass_dict[klass] = glass[glass['material'].str.contains(klass)]
Therefore you obtain all rows that contain ZN_PLASTIC
in the string
, what you need to do is match the exact klass
:
plastic_dict = {}
for klass in plastic['material'].unique():
plastic_dict[klass] = plastic.loc[plastic['material'] == klass]
Note: Always use loc
or iloc
to select the rows when you create partial dataframes
, otherwise later you might get some errors related to slices or copies when trying to apply other functions.
Upvotes: 1