user3625533
user3625533

Reputation: 415

filtering data in dataframe based on a column

Need to filter the data in dataframe and create a single row out of it. I tried to group by but want different values to filter so am failed to do so. Here is the example

#1      #2      #3      #4
1       Bob     10      Eng
2       Jack    21      Tel
3       Rian    11      Tam
4       Vick    10      Ram
5       Jick    20      Mar
6       Rams    10      Mal
7       Venk    21      Mar

First need to filter the rows on column #3 as 10 and followed by 21 means

#1      #2      #3      #4
1       Bob     10      Eng
2       Jack    21      Tel
3       Rams    10      Mal
4       Venk    21      Mal

Then convert this 2 row data into a single row like below

#1      #2      #3      #4      #5
1       Bob     Jack    Eng     Tel
2       Jick    Venk    Mal     Mar

Being a beginner i tried grouping in several ways but the values need to be picked is in order first row should contain 10 and the immediate row should be 21.

Can anybody suggest which API in pandas will help me achieving this?

Upvotes: 2

Views: 117

Answers (4)

Manav Shah
Manav Shah

Reputation: 32

Lists are ordered iterables, and the filter generator reads each item in, one at a time in that order. Therefore, it will yield output in order.

The filter function takes boolean value as input.

list(filter(lambda a: a is 10 or a is 21, col_3))

Upvotes: 0

Yuya Takashina
Yuya Takashina

Reputation: 632

If 21 always comes just after 10, you may like:

df_shifted = df.shift(-1, axis=0)  # df is the original DataFrame you have.
is_ok = (df["#3"] == 10) & (df_shifted["#3"] == 21)
ans = pd.concat([df[is_ok], df_shifted[is_ok]], axis=1).drop(columns="#3")

This will return:

  #2   #4  #2   #4
1 Bob  Eng Jack Tel
6 Rams Mal Venk Mar

Upvotes: 0

U13-Forward
U13-Forward

Reputation: 71580

Do you mean by something like?

>>> x = df.loc[(df['#3'].eq(10) & df['#3'].shift(-1).eq(21)) | (df['#3'].shift().eq(10) & df['#3'].eq(21))]
>>> x.groupby(np.arange(len(x)) // 2).apply(lambda x: pd.DataFrame(x.drop(["#1", "#3"], axis=1).to_numpy().flatten()).T).reset_index(drop=True).rename(lambda x: f'#{x + 1}', axis=1)
     #1   #2    #3   #4
0   Bob  Eng  Jack  Tel
1  Rams  Mal  Venk  Mar
>>> 

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

Try:

mask = df["#3"].eq(21) & df["#3"].shift().eq(10)
mask = mask.shift(-1).fillna(False) | mask
x = (
    df[mask]
    .groupby(np.arange(mask.sum()) // 2)
    .apply(lambda x: list(x["#2"]) + list(x["#4"]))
).apply(pd.Series)
x.columns = [f"#{x+2}" for x in x.columns]
x = x.reset_index().rename(columns={"index": "#1"})
x["#1"] += 1
print(x)

Prints:

   #1    #2    #3   #4   #5
0   1   Bob  Jack  Eng  Tel
1   2  Rams  Venk  Mal  Mar

Upvotes: 1

Related Questions