Create multiple DataFrames from a single DataFrame based on conditions by columns

Question

New to pandas and python so thank you in advance. I have a table

# Create DataFrame
data = [{'analyte': 'sample1'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2},
        {'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
        {'analyte': 'shoe', 'CAS1': 4},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
        {'analyte': 'sample2'},
        {'analyte': 'bacon', 'CAS1': 1},
        {'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
        {'analyte': 'money', 'CAS1': 3},
        {'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
        {'analyte': 'boy', 'CAS1': 5},
        {'analyte': 'girl', 'CAS1': 6},
        {'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)

Before Write Pandas DataFrame into a MySQL Database Table, i need to split df to separate tables, and then write each table to Mysql

How to split df by columns, somethink like, if column name contains string "cas1" then split df

for col in df.columns:
    if "cas1" in col:
       dfCas1 = df.split
       #add uniq index to indetify to which row belongs to
    if "cas2" in col:
       dfCas2 = df.split
       #add uniq index to indetify to which row belongs to
    if {"analyte","id" .etc } in col: # main table
       dfMain = df.split

dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')

expected

Lucvv · Accepted Answer

I'm not completely sure what you want to achieve, but I feel like you want to do something like splitting this:

+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
|         |    |      |        |      |        |
+---------+----+------+--------+------+--------+

to this:

+---------+----+  +------+--------+  +------+--------+
| Analyte | id |  | CAS1 | value1 |  | Cas2 | Value2 |
+---------+----+  +------+--------+  +------+--------+
|         |    |  |      |        |  |      |        |
+---------+----+  +------+--------+  +------+--------+

The first one is obtained by calling e.g. df.loc[:, ['Analyte', 'id']]. For the other ones, adjust the column names.

Now for the uniq index that is within your code comments, df.loc[:] keeps the index of the original table. You can use df.reset_index() to reset it to a unique integer index. If you also want to drop empty rows in one of your subtables before parsing, have a look at df.dropna().

Create multiple DataFrames from a single DataFrame based on conditions by columns

Answers (2)

Related Questions