Dmitrij Holkin
Dmitrij Holkin

Reputation: 2055

Create multiple DataFrames from a single DataFrame based on conditions by columns

New to pandas and python so thank you in advance. I have a table

LAB DATA

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

Before Write Pandas DataFrame into a MySQL Database Table, i need to split df to separate tables, and then write each table to Mysql

How to split df by columns, somethink like, if column name contains string "cas1" then split df

for col in df.columns:
    if "cas1" in col:
       dfCas1 = df.split
       #add uniq index to indetify to which row belongs to
    if "cas2" in col:
       dfCas2 = df.split
       #add uniq index to indetify to which row belongs to
    if {"analyte","id" .etc } in col: # main table
       dfMain = df.split

dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')

expected enter image description here

Upvotes: 0

Views: 197

Answers (2)

Lucvv
Lucvv

Reputation: 116

I'm not completely sure what you want to achieve, but I feel like you want to do something like splitting this:

+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
|         |    |      |        |      |        |
+---------+----+------+--------+------+--------+

to this:

+---------+----+  +------+--------+  +------+--------+
| Analyte | id |  | CAS1 | value1 |  | Cas2 | Value2 |
+---------+----+  +------+--------+  +------+--------+
|         |    |  |      |        |  |      |        |
+---------+----+  +------+--------+  +------+--------+

The first one is obtained by calling e.g. df.loc[:, ['Analyte', 'id']]. For the other ones, adjust the column names.

Now for the uniq index that is within your code comments, df.loc[:] keeps the index of the original table. You can use df.reset_index() to reset it to a unique integer index. If you also want to drop empty rows in one of your subtables before parsing, have a look at df.dropna().

Upvotes: 1

exokamen
exokamen

Reputation: 36

I am not 100% sure if this is what you mean, but:

dfCas1 = df[df.col.str.contains('cas1')]
dfCas2 = df[df.col.str.contains('cas2')]
dfMain = df[~((df.col.str.contains('cas2')) & df.col.str.contains('cas1'))]

The ~ sign negates the selection and means all rows where the columns do not contain cas1 and cas2. I hope this makes sense.

Upvotes: 1

Related Questions