Reputation: 56
I need your help on something like this:
My input (when I read .csv file):
data = {'A':['000','001','002'],
'B':['Name0','Name1','Name2 @35 @DI @003 @Name3 @68 @DI'],
'C':[27,24,35],
'D':['@DI','@DI','@DI']}
df = pd.DataFrame(data)
My desired output:
I don't know how to explain better than this.
I appreciate the help.
Thanks!
data = {'A':['000','001','002','003'],
'B':['Name0','Name1','Name2','Name3'],
'C':[27,24,35,68],
'D':['@DI','@DI','@DI','@DI']}
Upvotes: -1
Views: 60
Reputation: 14949
You can split the string based on '@'. Then you need to group them (split function will do that). Finally, Change the value of required columns and use pd.explode to get the result.
check if that's what you need, then I can explain in detail.
df['split'] = df['B'].str.split('@')
def split(id,x):
if len(x) <= 1:
return np.NaN
a_list = [id]
b_list = []
c_list = []
d_list = []
for index,i in enumerate(x):
if index % 4 == 0:
b_list.append(i)
elif (index+1)%4 == 0:
a_list.append(i)
elif (index+2) % 4 ==0:
i = '@' + str(i)
d_list.append(i)
else:
c_list.append(i)
return [a_list,b_list,c_list,d_list]
df['split'] = df.apply(lambda x: split(x['A'],x['split']), axis=1)
# df[]=df['split'].str[1]
df.loc[(~df['split'].isnull()), 'A'] = df.loc[(~df['split'].isnull()), 'split'].str[0]
df.loc[(~df['split'].isnull()), 'B'] = df.loc[(~df['split'].isnull()), 'split'].str[1]
df.loc[(~df['split'].isnull()), 'C'] = df.loc[(~df['split'].isnull()), 'split'].str[2]
df.loc[(~df['split'].isnull()), 'D'] = df.loc[(~df['split'].isnull()), 'split'].str[3]
df = df.drop('split', axis=1)
df = df.apply(pd.Series.explode)
Output -
A B C D
0 000 Name0 27 @DI
1 001 Name1 24 @DI
2 002 Name2 35 @DI
2 003 Name3 68 @DI
Upvotes: 1
Reputation: 174
It seems a physical work rather than a technical work:)
entry_list = df.loc[2, 'B'].split(' ')
df.loc[2, 'B'] = entry_list[0]
entry_list = entry_list[3:]
lines = []
for i in range(0, len(entry_list), 4):
raw_line = entry_list[i:i+4]
line = [item.replace('@', '') for item in raw_line[:-1]]
line.append(raw_line[-1])
lines.append(line)
df = pd.concat([df, pd.DataFrame(lines, columns=df.columns)]).reset_index(drop=True)
Upvotes: 1