Reputation: 2055
New to pandas and python so thank you in advance. I have a table
# Create DataFrame
data = [{'analyte': 'sample1'},
{'analyte': 'bacon', 'CAS1': 1},
{'analyte': 'eggs', 'CAS1': 2},
{'analyte': 'money', 'CAS1': 3, 'CAS2': 1, 'Value2': 1.11},
{'analyte': 'shoe', 'CAS1': 4},
{'analyte': 'boy', 'CAS1': 5},
{'analyte': 'girl', 'CAS1': 6},
{'analyte': 'onion', 'CAS1': 7, 'CAS2': 4, 'Value2': 6.53},
{'analyte': 'sample2'},
{'analyte': 'bacon', 'CAS1': 1},
{'analyte': 'eggs', 'CAS1': 2, 'CAS2': 1, 'Value2': 7.88},
{'analyte': 'money', 'CAS1': 3},
{'analyte': 'shoe', 'CAS1': 4, 'CAS2': 3, 'Value2': 15.5},
{'analyte': 'boy', 'CAS1': 5},
{'analyte': 'girl', 'CAS1': 6},
{'analyte': 'onion', 'CAS1': 7}]
df = pd.DataFrame(data)
Before Write Pandas DataFrame into a MySQL Database Table, i need to split df to separate tables, and then write each table to Mysql
How to split df by columns, somethink like, if column name contains string "cas1" then split df
for col in df.columns:
if "cas1" in col:
dfCas1 = df.split
#add uniq index to indetify to which row belongs to
if "cas2" in col:
dfCas2 = df.split
#add uniq index to indetify to which row belongs to
if {"analyte","id" .etc } in col: # main table
dfMain = df.split
dfMain.to_sql("Main", dbConnection, if_exists='fail')
dfCas1.to_sql("cas1", dbConnection, if_exists='fail')
dfCas2.to_sql("cas2", dbConnection, if_exists='fail')
Upvotes: 0
Views: 197
Reputation: 116
I'm not completely sure what you want to achieve, but I feel like you want to do something like splitting this:
+---------+----+------+--------+------+--------+
| Analyte | id | CAS1 | value1 | Cas2 | Value2 |
+---------+----+------+--------+------+--------+
| | | | | | |
+---------+----+------+--------+------+--------+
to this:
+---------+----+ +------+--------+ +------+--------+
| Analyte | id | | CAS1 | value1 | | Cas2 | Value2 |
+---------+----+ +------+--------+ +------+--------+
| | | | | | | | |
+---------+----+ +------+--------+ +------+--------+
The first one is obtained by calling e.g. df.loc[:, ['Analyte', 'id']]
. For the other ones, adjust the column names.
Now for the uniq index that is within your code comments, df.loc[:]
keeps the index of the original table. You can use df.reset_index() to reset it to a unique integer index. If you also want to drop empty rows in one of your subtables before parsing, have a look at df.dropna().
Upvotes: 1
Reputation: 36
I am not 100% sure if this is what you mean, but:
dfCas1 = df[df.col.str.contains('cas1')]
dfCas2 = df[df.col.str.contains('cas2')]
dfMain = df[~((df.col.str.contains('cas2')) & df.col.str.contains('cas1'))]
The ~ sign negates the selection and means all rows where the columns do not contain cas1 and cas2. I hope this makes sense.
Upvotes: 1