Reputation: 664
I'm doing some NLP work and I am trying to use groupby to do a post request inside of a lambda function and am getting a JSON object response that, unfortunately, results in NaN
. I need it to result in adding the fields after 'exploding' them.
Custom function:
def posTagger(text):
post = { "text": title }
endpoint = 'http://localhost:8001/api/postagger'
r = requests.post(endpoint, json=post)
r = r.json()
time.sleep(1)
return {"title": title, "result": r}
posTagger
return value:
[
{
"text": "Contemporary Modern Soft Area Rugs Nonslip",
"terms": [
{
"text": "Contemporary",
"penn": "JJ",
"tags": [
"Adjective"
]
},
{
"text": "Modern",
"penn": "NNP",
"tags": [
"ProperNoun",
"Noun",
"Singular"
]
},
{
"text": "Soft",
"penn": "NNP",
"tags": [
"ProperNoun",
"Noun",
"Singular"
]
},
{
"text": "Area",
"penn": "NN",
"tags": [
"Singular",
"Noun",
"ProperNoun"
]
},
{
"text": "Rugs",
"penn": "NNP",
"tags": [
"ProperNoun",
"Noun",
"Plural"
]
},
{
"text": "Nonslip",
"penn": "NNP",
"tags": [
"ProperNoun",
"Noun",
"Singular"
]
}
]
}
]
DataFrame
title = [
'Contemporary Modern Soft Area Rugs Nonslip Velvet Home Room Carpet Floor Mat Rug',
'Traditional Distressed Area Rug 8x10 Large Rugs for Living Room 5x8 Gray Ivory',
'Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Carpet Living Room Bedroom Large Rug'
]
df = pd.DataFrame(title, columns=['title'])
df
# Initial dataframe:
# title
# 0 Contemporary Modern Soft Area Rugs Nonslip...
# 1 Traditional Distressed Area Rug 8x10 Large...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft...
So, here's my groupby using .apply:
df['result'] = pd.DataFrame(df.groupby(['title']).apply(lambda x: posTagger(x)))
df
# Resulting DataFrame after **.apply**:
# title result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel... NaN
# 1 Traditional Distressed Area Rug 8x10 Large Rug... NaN
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car... NaN
So, here's my groupby using .transform:
df['result'] = pd.DataFrame(df.groupby(['title']).transform(lambda x: posTagger(x)))
df
# Resulting DataFrame after **.transform**:
# title result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel... {'title': ['Contemporary Modern Soft Area Rugs...
# 1 Traditional Distressed Area Rug 8x10 Large Rug... {'title': ['Contemporary Modern Soft Area Rugs...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car... {'title': ['Contemporary Modern Soft Area Rugs...
Notice, .transform
's result sent the same value multiple times. Why?
.apply
or .transform
to achieve this?Upvotes: 0
Views: 502
Reputation: 5648
I will discuss apply()
here, and there are a couple considerations for you to think through.
For your current function, to have that result (which is the dictionary) you can use the function as written and change the code to call it. You aren't really grouping on title unless they are others the same, so just use apply()
without groupby()
. This will not explode the dictionary. There are many ways to think about that.
def posTagger(text):
post = { "text": title }
endpoint = 'http://localhost:8001/api/postagger'
r = requests.post(endpoint, json=post)
r = r.json()
time.sleep(1)
return {"title": title, "result": r}
df['result'] = df.apply(lambda x: posTagger(x))
Now, if you do want to use groupby().apply()
you send the dataframe group as x, operate on it, then return x. This isn't tested, but this is one way to think about this problem.
def posTagger(x):
post = { "text": x['title'] }
endpoint = 'http://localhost:8001/api/postagger'
r = requests.post(endpoint, json=post)
r = r.json()
time.sleep(1)
x['result'] = {"title": x['title'], "result": r}
# or you may be able code in the explode here using something like
# dftemp = pd.DataFrame({"title": x['title'], "result": r})
# merging x = x.merge(dftemp)
# not tested at all but this would return x to the original dataframe
return x
df = df.groupby(['title']).apply(lambda x: posTagger(x))
Upvotes: 1