Reputation: 69
I'm new in Python and Pandas.
I'm trying to filter a string dataframe by two conditions, to get a list or dataframe with indexs and columns that satisfy both conditions. I get this dataframe from a spreadsheet where each cell is YES or NOT.
df = pd.DataFrame([['YES', 'YES', 'NO', 'NO'], ['NO', 'YES', 'NO', 'NO'], ['NO', 'NO', 'NO', 'NO'], ['YES', 'NO', 'NO', 'YES']],
index=['task1', 'task2', 'task3', 'task4'],
columns=['David', 'Carol', 'Tony', 'Anna'])
df
David Carol Tony Anna
taks1 YES YES NO NO
task2 NO YES NO NO
task3 NO NO NO NO
task4 YES NO NO YES
I need to get something like this (two lists, dataframe, bidimensional array...):
David task1
David task4
Carol task1
Carol task2
Anna task4
I have used loc, but I cannot extend the filter for all the columns:
active = df.loc[lambda df1: df1['David'] == 'YES', :]
Rows and columns number are unknown when I read spreadsheet, therefore, I need to have a flexible solution for different tables size.
Upvotes: 3
Views: 185
Reputation: 5918
s = df.unstack().eq('YES')
Based on requirement:
Multiindex array
s[s].index
MultiIndex([('David', 'task1'),
('David', 'task4'),
('Carol', 'task1'),
('Carol', 'task2'),
( 'Anna', 'task4')],
)
List
s[s].index.to_list()
[('David', 'task1'),
('David', 'task4'),
('Carol', 'task1'),
('Carol', 'task2'),
('Anna', 'task4')]
Numpy array
s[s].index.to_numpy()
array([('David', 'task1'), ('David', 'task4'), ('Carol', 'task1'),
('Carol', 'task2'), ('Anna', 'task4')], dtype=object)
DataFrame
s[s].index.to_frame()
0 1
David task1 David task1
task4 David task4
Carol task1 Carol task1
task2 Carol task2
Anna task4 Anna task4
Proper Dataframe with column names
We are using unstack
and then fetching the rows with 'YES'. After that, all the effort is to get the output in desired format.
Code
df.unstack().eq('YES').to_frame().rename(columns={
(0):'val'}).query('val == True').reset_index().rename(
columns={'level_0':'index', 'level_1':'value'}).drop('val',1)
Output
index value
0 David task1
1 David task4
2 Carol task1
3 Carol task2
4 Anna task4
Upvotes: 0
Reputation: 18315
need to get something like this (two lists, dataframe, bidimensional array...):
Then how about
[*zip(*df[df == "YES"].stack().index)]
which gives
[('task1', 'task1', 'task2', 'task4', 'task4'),
('David', 'Carol', 'Carol', 'David', 'Anna')]
We get the entries that are YES
and then stack
the columns (names) next to the index (tasks). Since we are only interested in index, take .index
. Then some zip
ping turns the multiindex into two separate lists.
(If you want it sorted by names, then you can put in .sort_index(level=1)
after .stack()
.)
Upvotes: 1
Reputation: 2233
You can use pandas' melt
to convert the data frame to the long format, and then apply the condition
df_long = df.melt(value_vars=['David', 'Carol', 'Tony', 'Anna'], ignore_index=False).reset_index()
df_long.columns = ['Task', 'Name', 'Value']
print(df_long)
Task Name Value
0 task1 David YES
1 task2 David NO
2 task3 David NO
3 task4 David YES
4 task1 Carol YES
5 task2 Carol YES
6 task3 Carol NO
7 task4 Carol NO
8 task1 Tony NO
9 task2 Tony NO
10 task3 Tony NO
11 task4 Tony NO
12 task1 Anna NO
13 task2 Anna NO
14 task3 Anna NO
15 task4 Anna YES
df_long.loc[df_long['Value']=='YES', ['Name', 'Task']].reset_index(drop=True)
Name Task
0 David task1
1 David task4
2 Carol task1
3 Carol task2
4 Anna task4
Upvotes: 1
Reputation: 14949
Use melt
and then query
to filter the result -
result = df.melt(value_vars=['David', 'Carol', 'Tony', 'Anna'], ignore_index=False).query('value == "YES"')['variable']
Output -
task1 David
task4 David
task1 Carol
task2 Carol
task4 Anna
Upvotes: 0
Reputation: 1213
Your df is pivoted, thus you want to unpivot your table. Unpivoting can be achieved using .melt()
in pandas. Here's the full code.
>>> import pandas as pd
>>> df = pd.read_clipboard().reset_index().rename(columns={'index':'task'})
>>> print(df)
task David Carol Tony Anna
0 taks1 YES YES NO NO
1 task2 NO YES NO NO
2 task3 NO NO NO NO
3 task4 YES NO NO YES
>>> df = df.melt(['task'])
>>> print(df.loc[df['value']=='YES', ['variable', 'task']])
variable task
0 David taks1
3 David task4
4 Carol taks1
5 Carol task2
15 Anna task4
Upvotes: 0