Reputation: 319
Here is my starting df:
import numpy as np
import pandas as pd
df = pd.DataFrame(['alpha', 'beta'], columns = ['text'])
df
text
0 alpha
1 beta
Here is the end result I want:
text first second third
0 alpha alpha-first alpha-second alpha-third
1 beta beta-first beta-second beta-third
I have written the custom function parse()
, no issue there:
def parse(text):
return [text + ' first', text + ' second', text + ' third']
Now I try to apply parse()
to the initial df, which is where errors arise:
1) If I try the following:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']) # Create empty columns
df[['first', 'second', 'third']] = df.text.apply(parse)
I get:
ValueError: Must have equal len keys and value when setting with an ndarray
2) Slightly different version:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']).astype(object) # Create empty columns of "object" type
df[['first', 'second', 'third']] = df.text.apply(parse)
I get:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast
to indexing result of shape (3,2)
Where am I going wrong?
EDIT:
I should clarify that parse()
itself is a much more complicated function in the real-world problem I'm trying to solve. (it takes a paragraph, finds 3 specific types of strings in it, and outputs those strings as a list of length 3). In my code above, I made up a somewhat random simple definition of parse()
as a substitute to avoid getting bogged down in details unrelated to the two errors I'm getting.
Upvotes: 1
Views: 160
Reputation: 1256
Check this:
lst = ['text','first','second','third']
df = pd.DataFrame([['alpha']*len(lst),['beta']*len(lst)],columns=lst)
final = df.apply(lambda x: x+'-'+x.name)
final.text = final.text.str.split('-')[0]
Upvotes: 0
Reputation: 210832
This can be done in a several ways:
Option 1:
def f(s):
return pd.DataFrame(np.repeat(s, 3).values.reshape(len(s), -1),
columns=['first','second','third']) \
.apply(lambda c: c+'-'+c.name)
In [183]: df[['first','second','third']] = f(df.text)
In [184]: df
Out[184]:
text first second third
0 alpha alpha-first alpha-second alpha-third
1 beta beta-first beta-second beta-third
Upvotes: 1
Reputation: 5215
Here's a one-liner with pd.DataFrame.assign
:
df.assign(**{x: df['text']+'-'+x for x in ['first', 'second', 'third']})
# text first second third
# 0 alpha alpha-first alpha-second alpha-third
# 1 beta beta-first beta-second beta-third
Upvotes: 1
Reputation: 164623
No need for apply
:
import pandas as pd
df = pd.DataFrame(['alpha', 'beta'], columns = ['text'])
for i in ['first', 'second', 'third']:
df[i] = df.text + '-' + i
# text first second third
# 0 alpha alpha-first alpha-second alpha-third
# 1 beta beta-first beta-second beta-third
In general the hierarchy of "process type" to choose for your calculations should be:
pd.Series.apply
pd.DataFrame.apply
pd.DataFrame.iterrows
Upvotes: 2