Reputation: 71
I have a data frame and in this data frame a have two columns (sometimes i have one column or more). I try to make a program that finds the rows that their values are True and also i want to make the following dictionary: My first sublist start with number 2 because i have the first value is True and i start a new one sublist when i find a value with True again.
In this dictionary i have the key = 0 that is the table (i have one pdf table that i read it with camelot) and also the values.
pandas_dict = {0:[[2,3,4,5],[6,7,8,9,10,11,12,13,14,15,16],[17,18,19,20],[21,18,19,29],[30,31,32,33,34]]}
Upvotes: 0
Views: 515
Reputation: 5999
To get each row that contains at least a True
, just do:
df[6] | df[7]
Or, if you have a variable number of columns, just select the lines that contain a True value:
df.any(axis='columns')
To generate your dict:
pandas_dict = {"0": [
group.index.to_list()
for _, group in df.groupby( (df[6] | df[7]).cumsum() )
]}
This works by (ab)using the fact that the sum of booleans in Python is the number of True
values. So the cumsum will generate unique values for your "groups", and you just have to take the list of indexes from each group to build your list.
Example with a randomly generated DataFrame:
import pandas as pd
import random
df = pd.DataFrame(
[
[random.choice([True, False]) , random.choice([True, False])]
for i in range(34)
],
columns=[6,7]
)
pandas_dict = {"0": [
group.index.to_list()
for _, group in df.groupby((df[6]|df[7]).cumsum())
]}
pandas_dict
will contain, for my randomly generated df
:
{'0': [[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9],
[10, 11],
[12],
[13],
[14],
[15, 16],
[17, 18, 19],
[20],
[21, 22],
[23, 24, 25],
[26],
[27],
[28, 29],
[30],
[31, 32],
[33]]}
Upvotes: 0
Reputation: 14949
One way:
pandas_dict = {'0': df.groupby(df.any(1).cumsum()).apply(
lambda x: x.index.to_list()).iloc[1:].to_list()}
Upvotes: 1