Reputation: 57
I have a dataframe df:
Col1 Col2 Col3
0 a1 NaN NaN
1 a2 b1 NaN
2 a3 b3 c1
3 a4 NaN c2
I have tried :
new_df = '[' + df + ']'
new_df['Col4']=new_df[new_df.columns[0:]].apply(lambda x:','.join(x.dropna().astype(str)),axis =1)
df_final = pd.concat([df, new_df['col4']], axis =1)
I am getting at this :
I was looking for a robust solution to get to something which must look like this:
I know there is no direct way to do this, the data frame eventually is going to be at least 20k rows and so the question to fellow stack-people.
Thanks.
let me know if you have any more questions and I can edit the question to add points.
Upvotes: 0
Views: 2723
Reputation: 862681
You can add []
for all columns without first not missing value tested with helper i
from enumerate
:
def f(x):
gen = (y for y in x if pd.notna(y))
return ','.join(y if i == 0 else '['+y+']' for i, y in enumerate(gen))
#f = lambda x: ','.join(y if i == 0 else '['+y+']' for i, y in enumerate(x.dropna()))
df['col4'] = df.apply(f, axis=1)
print (df)
Col1 Col2 Col3 Col4 col4
0 a1 NaN d8 NaN a1,[d8]
1 a2 b1 d3 NaN a2,[b1],[d3]
2 NaN b3 c1 NaN b3,[c1]
3 a4 NaN c2 NaN a4,[c2]
4 NaN NaN c6 d5 c6,[d5]
Performance test:
#test for 25k rows
df = pd.concat([df] * 5000, ignore_index=True)
f1 = lambda x: ','.join(y if i == 0 else '['+y+']' for i, y in enumerate(x.dropna()))
%timeit df.apply(f1, axis=1)
3.62 s ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.apply(f, axis =1)
475 ms ± 3.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 3
Reputation: 1015
new_col = []
for idx, row in df.iterrows():
val1 = row["Col1"]
val2 = row["Col2"]
val3 = row["Col3"]
new_val2 = f",[{val2}]" if pd.notna(val2) else ""
new_val3 = f",[{val3}]" if pd.notna(val3) else ""
val4 = f"{val1}{new_val2}{new_val3}"
new_col.append(val4)
df["Col4"] = new_col
Maybe my answer is not the most "computationally efficient", but if your dataset is 20k rows, it will be fast enough! I think my answer is very easy to read, and it is also easy to adapt it to different scenarios!
Upvotes: 1
Reputation: 1960
I'm not sure what your usecase is, but here you go
df['Col4'] = df.apply(lambda row:", ".join([(val if val[0]=='a' else "["+val+"]") for val in row if not pd.isna(val)]), axis=1)
It joins the rows together, by concatenating their values with ", ".join
, but only if they are not pd.isna
. It further puts everything in brackets that does not begin with a
.
Whatever you want to do with it, there probably is a better solution though
Upvotes: 2