Reputation: 45
dataframe 'df' has the following data -
Column A | Column B |
---|---|
Item_ID1 | Information - information for item that has ID as 1\nPrice - $7.99\nPlace - Albany, NY |
Item_ID2 | Information - item's information with ID as 2\nPrice - $5.99\nPlace - Ottawa, ON |
How to segregate the values from column B using 'Information', 'Price', and 'Place' into different columns like -
Column A | Information | Price | Place |
---|---|---|---|
Item_ID1 | information for item that has ID as 1 | $7.99 | Albany, NY |
Item_ID2 | item's information with ID as 2 | $5.99 | Ottawa, ON |
I tried splitting the column B based on string values like 'Information - ', 'Price - ', 'Place - ' but that is becoming more complicated and the very first slice has information on Price and Place which is not required in it.
Upvotes: 1
Views: 350
Reputation: 25323
Another possible solution, based on the following ideas:
Split Column B
by \s-\s|\\n
, using pandas.Series.str.split
.
Reshape the result, using numpy.reshape
.
Apply pandas.pivot_table
.
(pd.concat([df['Column A'], pd.DataFrame(
df['Column B'].str.split(r'\s-\s|\\n', expand=True, regex=True).values
.reshape((-1,2)))
.pivot_table(columns=0, values=1, aggfunc=list)
.pipe(lambda d: d.explode(d.columns.tolist(), ignore_index=True))], axis=1))
Output:
Column A Information Place Price
0 Item_ID1 information for item that has ID as 1 Albany, NY $7.99
1 Item_ID2 item's information with ID as 2 Ottawa, ON $5.99
Upvotes: 1
Reputation: 260510
For a generic method in which you do not need to know the future columns in advance, you can use str.extractall
and a pivot
:
out = df.drop(columns='Column B').join(
df['Column B']
.str.extractall(r'([^-]+) - ([^\n]+)\n?')
.droplevel('match')
.pivot(columns=0, values=1)
)
NB. I am assuming that you have real newlines, if you have instead the two characters \
and n
, you can convert with df['Column B'] = df['Column B'].str.replace(r'\\n', '\n')
Output:
Column A Information Place Price
0 Item_ID1 information for item that has ID as 1 Albany, NY $7.99
1 Item_ID2 item's information with ID as 2 Ottawa, ON $5.99
Upvotes: 2
Reputation: 37767
You can approach this by using pandas.Series.split
:
df[["Information", "Price", "Place"]]= df.pop("Column B").str.split(r"\\n", expand=True)
df= df.astype(str).apply(lambda x: x.replace(x.name, "", regex=True).str.strip(" - "))
print(df.to_string())
Column A Information Price Place
0 Item_ID1 information for item that has ID as 1 $7.99 Albany, NY
1 Item_ID2 item's information with ID as 2 $5.99 Ottawa, ON
Upvotes: 2