Reputation: 47
So, I've an excel file and that I have already converted in pandas dataframe, I've done some analysis on it already but there's an issue that I'm facing in it, that is is of how can I separate multiple values that are given in a same row, they're differentiated using a) name1 ; b) name2
As a beginner in pandas I'm unable to work upon a logic which can frame out the multiple values that are given in the column.
This is the dataset that I'm working on and I'm unsure how can I differentiate the multiple values that are given in the same row.
Upvotes: 0
Views: 676
Reputation: 1434
You can use .str.split()
to split the column into two and then .str.lstrip()
to remove the (a)
and (b)
:
>>> import pandas as pd
>>> df = pd.DataFrame({"Chronic medical conditions": ["(a) BP; (b) Diabetes", "(a) Diabetes; (b) high BP"]})
>>> df
Chronic medical conditions
0 (a) BP; (b) Diabetes
1 (a) Diabetes; (b) high BP
>>> df = df["Chronic medical conditions"].str.split(';', expand=True)
>>> df.columns = ["a", "b"] # rename columns as neccessary
>>> df
a b
0 (a) BP (b) Diabetes
1 (a) Diabetes (b) high BP
>>> df["a"] = df["a"].str.lstrip("(a) ")
>>> df["b"] = df["b"].str.lstrip(" (b)")
>>> df
a b
0 BP Diabetes
1 Diabetes high BP
Upvotes: 1