Reputation: 126
I have a column of objects in a pandas dataframe. Each row in this column has multiple values as a string. I want to create a new column of a nice list with one code on each row. I know of a few really sloppy ways to do this, but none of them feel right. Here we go:
The Data
data = [['A01.001 - Apple; R02.049 - Banana; B32:111 - Candy'],
['C30.086 - Deer; V83.038 - Ears; U23.133 - Race Car'],
['H14.200 - Silver; B32.111 - Candy']]
my_series = pd.DataFrame(data, columns=['Column'])
Table
Column
______
0 A01.001 - Apple; R02.049 - Banana; B32.111 - Candy
1 C30.086 - Deer; V83.038 - Ears; U23.133 - Race Car
2 H14.200 - Silver; B32.111 - Candy
Desired Result (I'm ok with a duplicates, can get rid of those later)
New Column
___________
A01.001 - Apple
R02.049 - Banana
B32.111 - Candy
C30.086 - Deer
V83.038 - Ears
U23.133 - Race Car
H14.200 - Silver
B32.111 - Candy
My Crappy Code
a_list = []
def get_data(data):
for i in data:
pattern = r'[^;]+'
regex = re.findall(pattern, i)
for match in regex:
a_list.append(match)
get_data(my_series)
print(a_list)
(I'm not concerned with turning this list back to a Series at the moment, there will be a few destinations for this list. This code returns about 50,000 results, and dropping duplicates reduces it to about 10,000, if that matters to anyone.)
Despite the fact that this seemed to 'work', I know it's crap and here's why I think it's crap. I declared a variable and my function secretly writes to it. My gut tells me this is really stupid, but I've been at this for way longer than it should have taken. I know I'm doing something wrong, but I can't figure out what it is. I haven't had any luck finding an example of this scenario on the internet, so I appreciate anyone who can take a swing at it, maybe smack me in the head. Thank you for your time!
Upvotes: 1
Views: 86
Reputation: 323316
You just need split
the columns and re-create the dataframe
df=pd.DataFrame(my_series.Column.str.split(';').sum(),columns=['columns'])
df
columns
0 A01.001 - Apple
1 R02.049 - Banana
2 B32:111 - Candy
3 C30.086 - Deer
4 V83.038 - Ears
5 U23.133 - Race Car
6 H14.200 - Silver
7 B32.111 - Candy
Upvotes: 4