FenryrMKIII
FenryrMKIII

Reputation: 1198

Why isn't my source pandas dataframe not updated in the following function?

I have the following function (it's a minimum working example mimicking what I actually do in my complete code :

import pandas as pd
import numpy as np

# hardoded data for reproducibility
df = pd.DataFrame(
    [
        ["SiteA", "Long_Key_With_KeyWord", np.nan],
        ["SiteA", "Long_Key_Without", np.nan],
        ["SiteB", "Long_Key_With_KeyWord", np.nan],
    ],
    columns=["site", "tags", "to_fill"],
)
library = {"SiteA": {"KeyWord": "NewKeyWord"}}

# logic
df_part = df.loc[df.to_fill.isna(), :]
groupby_site = df_part.groupby("site")

for site in groupby_site.groups.keys():
    site_data = groupby_site.get_group(site)
    try:
        library_site_data = library[site]
        for idx, row in site_data.iterrows():
            mask = [key in row["tags"] for key in library_site_data.keys()]
            match = [key for key, mask in zip(library_site_data.keys(), mask) if mask]
            if match:
                value = library_site_data[match[0]]
                df_part.loc[idx, "to_fill"] = value
            else:
                print(f"Too bad")
    except KeyError:
        print(f"no data for site {site} in library")
        next

print(
    f"Total unfound mapping tags {df.to_fill.isna().sum()}"
)  # why isn't the df being filled in ?

What I don't understand is why df isn't being filled in whereas I believe df_part is a reference to df and df_part being filled in, it should also fill in df.

I get this :

print(df)
    site                   tags  to_fill
0  SiteA  Long_Key_With_KeyWord      NaN
1  SiteA       Long_Key_Without      NaN
2  SiteB  Long_Key_With_KeyWord      NaN

and I want this :

    site                   tags  to_fill
0  SiteA  Long_Key_With_KeyWord      NewKeyWord
1  SiteA       Long_Key_Without      NaN
2  SiteB  Long_Key_With_KeyWord      NaN


What am I missing ?

Upvotes: 0

Views: 76

Answers (1)

Nimrod Carmel
Nimrod Carmel

Reputation: 508

I think the problem is with the double indexing (loc) that may create a copy instead of retuning a view. The first answer for this question explains it further.

Upvotes: 1

Related Questions