Jenny Jing Yu
Jenny Jing Yu

Reputation: 195

How to get first 3 elements with values per ordered row in Python

I have a DataFrame with data type as string like below:

ID Var1 Var2 Var3 Var4 Var5
1 A B C D E
2 B C D
3 C D E
4 A C E

I would like to pick the first 3 element for each row like below:

ID Var1 Var2 Var3 Var4 Var5 Var6
1 A B C D E A,B,C
2 B C D B,C,D
3 C D E C,D,E
4 A C E A,C,E

Upvotes: 3

Views: 1776

Answers (4)

SeaBean
SeaBean

Reputation: 23237

Try this:

df['Var6'] = df.apply(lambda x: [y for y in x.values if y != ''][:3], axis=1)

Resulting df:

  Var1 Var2 Var3 Var4 Var5       Var6
1    A    B    C    D    E  [A, B, C]
2         B    C    D       [B, C, D]
3              C    D    E  [C, D, E]
4    A         C         E  [A, C, E]

If you want the results as comma separated string, further use:

df['Var6'] = df['Var6'].str.join(', ')

Resulting df:

  Var1 Var2 Var3 Var4 Var5     Var6
1    A    B    C    D    E  A, B, C
2         B    C    D       B, C, D
3              C    D    E  C, D, E
4    A         C         E  A, C, E

If you want to do it in one step, use:

df['Var6'] = df.apply(lambda x: ','.join([y for y in x.values if y != ''][:3]), axis=1)

Edit

I interpreted ID in the sample data as the row index when I provided my answer, especially when OP mentioned the DataFrame is with data type as string and when she picks the first 3 elements from each row, values from column labelled ID are not picked.

However, I seen some other answer treated ID as a data column. To be complete, I would like to add codes for in case ID is a data column while its values are still NOT to be picked for the first 3 elements in each row.

In case ID is a data column but not to be picked: slightly adjust the codes as follows:

df1 = df.set_index('ID')      # temporarily set column ID as index
# same code as my main answer except to replace df by df1
df1['Var6'] = df1.apply(lambda x: ','.join([y for y in x.values if y != ''][:3]), axis=1)
df = df1.reset_index()        # reset the index to move ID back to data column 

Upvotes: 3

BENY
BENY

Reputation: 323396

Try with stack then groupby head

df['out'] = df.stack().loc[lambda x : x!=''].groupby(level=0).head(3).groupby(level=0).agg(','.join)
Out[277]: 
0    A,B,C
1    B,C,D
2    C,D,E
3    A,C,E
dtype: object

Or

df.where(df=='',df+',').sum(1).str[:5]
Out[282]: 
0    A,B,C
1    B,C,D
2    C,D,E
3    A,C,E
dtype: object

Upvotes: 3

datapug
datapug

Reputation: 2441

Alternative solution leveraging df.values.tolist()

import pandas as pd
# Working example with empty strings 
df = pd.DataFrame({"Var1": ["A", "", "", "A"],
                   "Var2": ["B", "B",  "", ""],
                   "Var3": ["C",  "C", "C", "C"],
                   "Var4": ["D", "D", "D", ""],
                   "Var5": ["E", "", "E", "E"]
                  })
df["Var6"] = df.values.tolist()
df["Var6"] = df["Var6"].map(lambda x: ",".join([i for i in x if i != ""][:3]))
df

# Working example with None
df = pd.DataFrame({"Var1": ["A", None, None, "A"],
                   "Var2": ["B", "B",  None, None],
                   "Var3": ["C",  "C", "C", "C"],
                   "Var4": ["D", "D", "D", None],
                   "Var5": ["E", None, "E", "E"]
                  })
df["Var6"] = df.values.tolist()
df["Var6"] = df["Var6"].map(lambda x: ",".join([i for i in x if i is not None][:3]))
df

Upvotes: 1

Umar.H
Umar.H

Reputation: 23099

you can use stack with groupby.head(3) and chain groupby.agg(','.join)

df['var6'] = df.set_index('ID',append=True)\
               .stack().groupby(level=[0]).head(3)\
                       .groupby(level=0).agg(','.join)

print(df)

   ID Var1 Var2 Var3 Var4 Var5   var6
0   1    A    B    C    D    E  A,B,C
1   2  NaN    B    C    D  NaN  B,C,D
2   3  NaN  NaN    C    D    E  C,D,E
3   4    A  NaN    C  NaN    E  A,C,E

Upvotes: 3

Related Questions