question.it
question.it

Reputation: 2968

Concatenate dataframe column which present in List, Python 3.6

I have Dataframe which has 4 columns; I want to concatenate columns which present in another List

df = pd.DataFrame([['A', 1, 'D', 4], ['B', 2, 'C', 3], ['C', 3, 'B', 2], ['D', 4, 'A', 1]], columns=['C1', 'C2', 'C3', 'C4'])

Con = ['C1', 'C2']

df['Con'] = df['C1'].astype(str) + '|' + df['C2'].astype(str)  # Manual

but my concatenate fields changes every time; so how to concatenate multiple columns which is mentioned in the input list "Con", I want to take list of columns from the list.

Upvotes: 1

Views: 126

Answers (4)

sammywemmy
sammywemmy

Reputation: 28644

If you are keen on speed (which is not important for every scenario), you can work on individual columns, which are faster than when working on the dataframe as a whole :

df['New'] = df.loc[:, Con[0]].str.cat(df.loc[:, Con[-1]].astype(str), sep="|")

You could get significantly more speed if you dump the string manipulation into Python, which is usually faster for such tasks:

df['New'] = ["|".join(columns) for columns in zip(df.C1, df.C2.astype(str))]

Again, @HenryYik answer is clean, no fuss and to the point. This is just one way to get a speed improvement when it matters.

Upvotes: 2

anky
anky

Reputation: 75080

@HenryYik's answer is the de-facto method in pandas , however throwing in another way and is performant too:

df["new"] = [*map('|'.join,df[Con].astype(str).to_numpy().tolist())]
print(df)

  C1  C2 C3  C4  new
0  A   1  D   4  A|1
1  B   2  C   3  B|2
2  C   3  B   2  C|3
3  D   4  A   1  D|4

Upvotes: 1

Anurag Saraf
Anurag Saraf

Reputation: 53

This should work for you:

def concat(row, cons):
    fin = str(row[cons[0]])
    for con in cons[1:]:
        fin = fin + "|" + str(row[con])
    return fin

df['Con'] =df.apply(lambda x: concat(x, Con),axis = 1)

Upvotes: 1

Henry Yik
Henry Yik

Reputation: 22493

IIUC you can use agg with axis=1 and str.join:

Con = ['C1', 'C2']

df["new"] = df[Con].astype(str).agg("|".join, axis=1)

print (df)

  C1  C2 C3  C4  new
0  A   1  D   4  A|1
1  B   2  C   3  B|2
2  C   3  B   2  C|3
3  D   4  A   1  D|4

Upvotes: 6

Related Questions