Reputation: 99
I've created a dataframe which varies according to the data which will be used as input, so this df can be anywhere between 3 and 100 columns long. I am looking for a way to only have 3 columns where all following columns (regardless of the amount as this is unknown) are concatenated in the third column (a sort of list with ',')
Upvotes: 1
Views: 58
Reputation: 14949
TRY:
df = (
pd.concat(
[df[df.columns[:2]],
df[df.columns[2:]].apply(list, axis=1)],
axis=1)
)
To filter out NAN you can use a list comprehension:
import numpy as np
result = (
pd.concat(
[df[df.columns[:2]],
df[df.columns[2:]].apply(lambda x: [i for i in x if i not in ['nan', 'NaN', 'None', np.NAN]], axis=1)],
axis=1)
)
If you want comma-separated value instead of list use:
import numpy as np
result = pd.concat(
[
df[df.columns[:2]],
df[df.columns[2:]].apply(
lambda x: ', '.join(
i for i in x if i not in ['nan', 'NaN', 'None', np.NAN]
),
axis=1,
),
],
axis=1,
)
Upvotes: 1