Reputation: 81
I am unable to split a pandas series that contains a semicolon. Is it because I am using the column name ('Social_Media') as an index or is it because python wont recognise a semicolon as a split character? Or is something wrong with my script?
#Filters the NaN columns
df2 = df[df['Social_Media'].notnull()]
# Splitter for semicolon
df2['Social_Media'].apply(lambda x: x.split(';')[0])
#This is my output after the split
Timestamp
2017-06-01 18:10:46 Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:24:04 Twitter;Facebook;Instagram;WhatsApp;Google+
2017-06-01 19:25:21 Twitter;Facebook;Instagram;WhatsApp;Google+
What I need to see as output.
Timestamp name_a name_b name_c name_d name_e
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
Upvotes: 2
Views: 2215
Reputation: 863246
You can use str.split
df = df['Social_Media'].str.split(';', expand=True).add_prefix('name_')
print (df)
name_0 name_1 name_2 name_3 name_4
Timestamp
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
And for columns names by alphabet:
import string
L = list(string.ascii_lowercase)
names = dict(zip(range(len(L)), ['name_' + x for x in L]))
df = df['Social_Media'].str.split(';', expand=True).rename(columns=names)
print (df)
name_a name_b name_c name_d name_e
Timestamp
2017-06-01 18:10:46 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:24:04 Twitter Facebook Instagram WhatsApp Google+
2017-06-01 19:25:21 Twitter Facebook Instagram WhatsApp Google+
Upvotes: 1