rom
rom

Reputation: 664

Groupby Apply/Transform Custom Function With Arguments Pandas

I'm doing some NLP work and I am trying to use groupby to do a post request inside of a lambda function and am getting a JSON object response that, unfortunately, results in NaN. I need it to result in adding the fields after 'exploding' them.

Custom function:

def posTagger(text):
    post = { "text": title }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    return {"title": title, "result": r}


posTagger return value:

[
    {
        "text": "Contemporary Modern Soft Area Rugs Nonslip",
        "terms": [
            {
                "text": "Contemporary",
                "penn": "JJ",
                "tags": [
                    "Adjective"
                ]
            },
            {
                "text": "Modern",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            },
            {
                "text": "Soft",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            },
            {
                "text": "Area",
                "penn": "NN",
                "tags": [
                    "Singular",
                    "Noun",
                    "ProperNoun"
                ]
            },
            {
                "text": "Rugs",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Plural"
                ]
            },
            {
                "text": "Nonslip",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            }
        ]
    }
]

DataFrame

title = [
    'Contemporary Modern Soft Area Rugs Nonslip Velvet Home Room Carpet Floor Mat Rug', 
    'Traditional Distressed Area Rug 8x10 Large Rugs for Living Room 5x8 Gray Ivory', 
    'Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Carpet Living Room Bedroom Large Rug'
    ]
df = pd.DataFrame(title, columns=['title'])
df

# Initial dataframe:

# title
# 0 Contemporary Modern Soft Area Rugs Nonslip...
# 1 Traditional Distressed Area Rug 8x10 Large...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft...

So, here's my groupby using .apply:

df['result'] = pd.DataFrame(df.groupby(['title']).apply(lambda x: posTagger(x)))
df

# Resulting DataFrame after **.apply**:

#   title   result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel...   NaN
# 1 Traditional Distressed Area Rug 8x10 Large Rug...   NaN
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car...   NaN

So, here's my groupby using .transform:

df['result'] = pd.DataFrame(df.groupby(['title']).transform(lambda x: posTagger(x)))
df

# Resulting DataFrame after **.transform**:

# title result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel...   {'title': ['Contemporary Modern Soft Area Rugs...
# 1 Traditional Distressed Area Rug 8x10 Large Rug...   {'title': ['Contemporary Modern Soft Area Rugs...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car...   {'title': ['Contemporary Modern Soft Area Rugs...

Notice, .transform's result sent the same value multiple times. Why?

  1. How do I get return value from the custom function (which returns an object with nested arrays) to be added in exploded form to the same dataframe as new columns?
  2. Is it better to use .apply or .transform to achieve this?

Upvotes: 0

Views: 502

Answers (1)

Jonathan Leon
Jonathan Leon

Reputation: 5648

I will discuss apply() here, and there are a couple considerations for you to think through.

For your current function, to have that result (which is the dictionary) you can use the function as written and change the code to call it. You aren't really grouping on title unless they are others the same, so just use apply() without groupby(). This will not explode the dictionary. There are many ways to think about that.

def posTagger(text):
    post = { "text": title }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    return {"title": title, "result": r}

df['result'] = df.apply(lambda x: posTagger(x))

Now, if you do want to use groupby().apply() you send the dataframe group as x, operate on it, then return x. This isn't tested, but this is one way to think about this problem.

def posTagger(x):
    post = { "text": x['title'] }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    x['result'] = {"title": x['title'], "result": r}
    # or you may be able code in the explode here using something like
    # dftemp = pd.DataFrame({"title": x['title'], "result": r})
    # merging x = x.merge(dftemp)
    # not tested at all but this would return x to the original dataframe
    return x

df = df.groupby(['title']).apply(lambda x: posTagger(x))

Upvotes: 1

Related Questions