Reputation: 195
I have a DataFrame with data type as string like below:
ID | Var1 | Var2 | Var3 | Var4 | Var5 |
---|---|---|---|---|---|
1 | A | B | C | D | E |
2 | B | C | D | ||
3 | C | D | E | ||
4 | A | C | E |
I would like to pick the first 3 element for each row like below:
ID | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
---|---|---|---|---|---|---|
1 | A | B | C | D | E | A,B,C |
2 | B | C | D | B,C,D | ||
3 | C | D | E | C,D,E | ||
4 | A | C | E | A,C,E |
Upvotes: 3
Views: 1776
Reputation: 23237
Try this:
df['Var6'] = df.apply(lambda x: [y for y in x.values if y != ''][:3], axis=1)
Resulting df:
Var1 Var2 Var3 Var4 Var5 Var6
1 A B C D E [A, B, C]
2 B C D [B, C, D]
3 C D E [C, D, E]
4 A C E [A, C, E]
If you want the results as comma separated string, further use:
df['Var6'] = df['Var6'].str.join(', ')
Resulting df:
Var1 Var2 Var3 Var4 Var5 Var6
1 A B C D E A, B, C
2 B C D B, C, D
3 C D E C, D, E
4 A C E A, C, E
If you want to do it in one step, use:
df['Var6'] = df.apply(lambda x: ','.join([y for y in x.values if y != ''][:3]), axis=1)
I interpreted ID
in the sample data as the row index when I provided my answer, especially when OP mentioned the DataFrame is with data type as string and when she picks the first 3 elements from each row, values from column labelled ID
are not picked.
However, I seen some other answer treated ID
as a data column. To be complete, I would like to add codes for in case ID
is a data column while its values are still NOT to be picked for the first 3 elements in each row.
In case ID
is a data column but not to be picked: slightly adjust the codes as follows:
df1 = df.set_index('ID') # temporarily set column ID as index
# same code as my main answer except to replace df by df1
df1['Var6'] = df1.apply(lambda x: ','.join([y for y in x.values if y != ''][:3]), axis=1)
df = df1.reset_index() # reset the index to move ID back to data column
Upvotes: 3
Reputation: 323396
Try with stack
then groupby
head
df['out'] = df.stack().loc[lambda x : x!=''].groupby(level=0).head(3).groupby(level=0).agg(','.join)
Out[277]:
0 A,B,C
1 B,C,D
2 C,D,E
3 A,C,E
dtype: object
Or
df.where(df=='',df+',').sum(1).str[:5]
Out[282]:
0 A,B,C
1 B,C,D
2 C,D,E
3 A,C,E
dtype: object
Upvotes: 3
Reputation: 2441
Alternative solution leveraging df.values.tolist()
import pandas as pd
# Working example with empty strings
df = pd.DataFrame({"Var1": ["A", "", "", "A"],
"Var2": ["B", "B", "", ""],
"Var3": ["C", "C", "C", "C"],
"Var4": ["D", "D", "D", ""],
"Var5": ["E", "", "E", "E"]
})
df["Var6"] = df.values.tolist()
df["Var6"] = df["Var6"].map(lambda x: ",".join([i for i in x if i != ""][:3]))
df
# Working example with None
df = pd.DataFrame({"Var1": ["A", None, None, "A"],
"Var2": ["B", "B", None, None],
"Var3": ["C", "C", "C", "C"],
"Var4": ["D", "D", "D", None],
"Var5": ["E", None, "E", "E"]
})
df["Var6"] = df.values.tolist()
df["Var6"] = df["Var6"].map(lambda x: ",".join([i for i in x if i is not None][:3]))
df
Upvotes: 1
Reputation: 23099
you can use stack
with groupby.head(3)
and chain groupby.agg(','.join)
df['var6'] = df.set_index('ID',append=True)\
.stack().groupby(level=[0]).head(3)\
.groupby(level=0).agg(','.join)
print(df)
ID Var1 Var2 Var3 Var4 Var5 var6
0 1 A B C D E A,B,C
1 2 NaN B C D NaN B,C,D
2 3 NaN NaN C D E C,D,E
3 4 A NaN C NaN E A,C,E
Upvotes: 3