Reputation: 12689
I am trying to concat multiple Pandas DataFrame columns with different tokens.
For example, my dataset looks like this :
dataframe = pd.DataFrame({'col_1' : ['aaa','bbb','ccc','ddd'],
'col_2' : ['name_aaa','name_bbb','name_ccc','name_ddd'],
'col_3' : ['job_aaa','job_bbb','job_ccc','job_ddd']})
I want to output something like this:
features
0 aaa <0> name_aaa <1> job_aaa
1 bbb <0> name_bbb <1> job_bbb
2 ccc <0> name_ccc <1> job_ccc
3 ddd <0> name_ddd <1> job_ddd
Explanation :
concat each column with "<{}>" where {} will be increasing numbers.
What I've tried so far:
I don't want to modify original DataFrame so I created two new dataframe:
features_df = pd.DataFrame()
final_df = pd.DataFrame()
for iters in range(len(dataframe.columns)):
features_df[dataframe.columns[iters]] = dataframe[dataframe.columns[iters]] + ' ' + "<{}>".format(iters)
final_df['features'] = features_df[features_df.columns].agg(' '.join, axis=1)
There is an issue I am facing, It's adding <2> at last but I want output like above, also this is not panda's way to do this task, How I can make it more efficient?
Upvotes: 19
Views: 891
Reputation: 71707
You can use df.agg
to join the columns of the dataframe by passing the optional parameter axis=1
. Use:
df['features'] = df.agg(
lambda s: r' <{}> '.join(s).format(*range(s.size)), axis=1)
Output:
# print(df)
col_1 col_2 col_3 features
0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa
1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb
2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc
3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd
Upvotes: 8
Reputation: 28729
df['features'] = [" ".join(F"{entry}<{num}>"
if ent[-1] != entry
else entry
for num, entry in enumerate(ent) )
for ent in df.to_numpy()]
col_1 col_2 col_3 features
0 aaa name_aaa job_aaa aaa<0> name_aaa<1> job_aaa
1 bbb name_bbb job_bbb bbb<0> name_bbb<1> job_bbb
2 ccc name_ccc job_ccc ccc<0> name_ccc<1> job_ccc
3 ddd name_ddd job_ddd ddd<0> name_ddd<1> job_ddd
Upvotes: 3
Reputation: 195623
from itertools import chain
dataframe['features'] = dataframe.apply(lambda x: ''.join([*chain.from_iterable((v, f' <{i}> ') for i, v in enumerate(x))][:-1]), axis=1)
print(dataframe)
Prints:
col_1 col_2 col_3 features
0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa
1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb
2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc
3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd
Upvotes: 8
Reputation: 8302
def join_(value):
vals = []
for i, j in enumerate(value):
vals.append(j + " <%d>" % i if i < len(value) - 1 else j)
return " ".join(vals)
# setting axis=1 will pass all columns to the join_ func.
dataframe['featurs'] = dataframe.apply(lambda x: join_(x), axis=1)
print(dataframe)
Output
col_1 col_2 col_3 featurs
0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa
1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb
2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc
3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd
Upvotes: 3