nico
nico

Reputation: 69

Get indexs and columns from a Dataframe that satisfy two conditions

I'm new in Python and Pandas.

I'm trying to filter a string dataframe by two conditions, to get a list or dataframe with indexs and columns that satisfy both conditions. I get this dataframe from a spreadsheet where each cell is YES or NOT.

df = pd.DataFrame([['YES', 'YES', 'NO', 'NO'], ['NO', 'YES', 'NO', 'NO'], ['NO', 'NO', 'NO', 'NO'], ['YES', 'NO', 'NO', 'YES']],

     index=['task1', 'task2', 'task3', 'task4'],

     columns=['David', 'Carol', 'Tony', 'Anna'])

df

            David   Carol   Tony   Anna
taks1       YES     YES     NO     NO
task2       NO      YES     NO     NO
task3       NO      NO      NO     NO
task4       YES     NO      NO     YES

I need to get something like this (two lists, dataframe, bidimensional array...):

David   task1
David   task4
Carol   task1
Carol   task2
Anna    task4

I have used loc, but I cannot extend the filter for all the columns:

active = df.loc[lambda df1: df1['David'] == 'YES', :]

Rows and columns number are unknown when I read spreadsheet, therefore, I need to have a flexible solution for different tables size.

Upvotes: 3

Views: 185

Answers (5)

Utsav
Utsav

Reputation: 5918

s = df.unstack().eq('YES')

Based on requirement:

Multiindex array

s[s].index

MultiIndex([('David', 'task1'),
            ('David', 'task4'),
            ('Carol', 'task1'),
            ('Carol', 'task2'),
            ( 'Anna', 'task4')],
           )

List

s[s].index.to_list()

[('David', 'task1'),
 ('David', 'task4'),
 ('Carol', 'task1'),
 ('Carol', 'task2'),
 ('Anna', 'task4')]

Numpy array

s[s].index.to_numpy()

array([('David', 'task1'), ('David', 'task4'), ('Carol', 'task1'),
       ('Carol', 'task2'), ('Anna', 'task4')], dtype=object)

DataFrame

s[s].index.to_frame()
                0       1
David   task1   David   task1
        task4   David   task4
Carol   task1   Carol   task1
        task2   Carol   task2
Anna    task4   Anna    task4

Proper Dataframe with column names

We are using unstack and then fetching the rows with 'YES'. After that, all the effort is to get the output in desired format.

Code

df.unstack().eq('YES').to_frame().rename(columns={
    (0):'val'}).query('val == True').reset_index().rename(
    columns={'level_0':'index', 'level_1':'value'}).drop('val',1)

Output

    index   value
0   David   task1
1   David   task4
2   Carol   task1
3   Carol   task2
4   Anna    task4

Upvotes: 0

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

need to get something like this (two lists, dataframe, bidimensional array...):

Then how about

[*zip(*df[df == "YES"].stack().index)]

which gives

[('task1', 'task1', 'task2', 'task4', 'task4'),
 ('David', 'Carol', 'Carol', 'David', 'Anna')]

We get the entries that are YES and then stack the columns (names) next to the index (tasks). Since we are only interested in index, take .index. Then some zipping turns the multiindex into two separate lists.

(If you want it sorted by names, then you can put in .sort_index(level=1) after .stack().)

Upvotes: 1

imdevskp
imdevskp

Reputation: 2233

You can use pandas' melt to convert the data frame to the long format, and then apply the condition

df_long = df.melt(value_vars=['David', 'Carol', 'Tony', 'Anna'], ignore_index=False).reset_index()
df_long.columns = ['Task', 'Name', 'Value']
print(df_long)
     Task   Name Value
0   task1  David   YES
1   task2  David    NO
2   task3  David    NO
3   task4  David   YES
4   task1  Carol   YES
5   task2  Carol   YES
6   task3  Carol    NO
7   task4  Carol    NO
8   task1   Tony    NO
9   task2   Tony    NO
10  task3   Tony    NO
11  task4   Tony    NO
12  task1   Anna    NO
13  task2   Anna    NO
14  task3   Anna    NO
15  task4   Anna   YES
         
df_long.loc[df_long['Value']=='YES', ['Name', 'Task']].reset_index(drop=True)
    Name   Task
0  David  task1
1  David  task4
2  Carol  task1
3  Carol  task2
4   Anna  task4

Upvotes: 1

Nk03
Nk03

Reputation: 14949

Use melt and then query to filter the result -

result = df.melt(value_vars=['David', 'Carol', 'Tony', 'Anna'], ignore_index=False).query('value == "YES"')['variable']

Output -

task1    David
task4    David
task1    Carol
task2    Carol
task4     Anna

Upvotes: 0

Gusti Adli
Gusti Adli

Reputation: 1213

Your df is pivoted, thus you want to unpivot your table. Unpivoting can be achieved using .melt() in pandas. Here's the full code.

>>> import pandas as pd
>>> df = pd.read_clipboard().reset_index().rename(columns={'index':'task'})
>>> print(df)
    task David Carol Tony Anna
0  taks1   YES   YES   NO   NO
1  task2    NO   YES   NO   NO
2  task3    NO    NO   NO   NO
3  task4   YES    NO   NO  YES
>>> df = df.melt(['task'])
>>> print(df.loc[df['value']=='YES', ['variable', 'task']])
   variable   task
0     David  taks1
3     David  task4
4     Carol  taks1
5     Carol  task2
15     Anna  task4

Upvotes: 0

Related Questions