heitorlopes2
heitorlopes2

Reputation: 56

There's some way to create a line from a string split

I need your help on something like this:

My input (when I read .csv file):

enter image description here

data = {'A':['000','001','002'],
   'B':['Name0','Name1','Name2 @35 @DI @003 @Name3 @68 @DI'],
   'C':[27,24,35],
   'D':['@DI','@DI','@DI']}

df = pd.DataFrame(data)

My desired output:

enter image description here

I don't know how to explain better than this.

I appreciate the help.

Thanks!

data = {'A':['000','001','002','003'],
   'B':['Name0','Name1','Name2','Name3'],
   'C':[27,24,35,68],
   'D':['@DI','@DI','@DI','@DI']}

Upvotes: -1

Views: 60

Answers (2)

Nk03
Nk03

Reputation: 14949

You can split the string based on '@'. Then you need to group them (split function will do that). Finally, Change the value of required columns and use pd.explode to get the result.

check if that's what you need, then I can explain in detail.

df['split'] = df['B'].str.split('@')
def split(id,x):
    if len(x) <= 1:
        return np.NaN
    a_list = [id]
    b_list = []
    c_list = []
    d_list = []
    for index,i in enumerate(x):
        if index % 4 == 0:
            b_list.append(i)
        elif (index+1)%4 == 0:
            a_list.append(i)
        elif (index+2) % 4 ==0:
            i = '@' + str(i)
            d_list.append(i)
        else:
            c_list.append(i)
    return [a_list,b_list,c_list,d_list]
            
        
df['split'] = df.apply(lambda x: split(x['A'],x['split']), axis=1)

# df[]=df['split'].str[1]
df.loc[(~df['split'].isnull()), 'A'] = df.loc[(~df['split'].isnull()), 'split'].str[0]
df.loc[(~df['split'].isnull()), 'B'] = df.loc[(~df['split'].isnull()), 'split'].str[1]
df.loc[(~df['split'].isnull()), 'C'] = df.loc[(~df['split'].isnull()), 'split'].str[2]
df.loc[(~df['split'].isnull()), 'D'] = df.loc[(~df['split'].isnull()), 'split'].str[3]
df = df.drop('split', axis=1)
df = df.apply(pd.Series.explode)

Output -

      A       B    C     D
0   000   Name0   27   @DI
1   001   Name1   24   @DI
2   002  Name2   35   @DI 
2  003   Name3   68    @DI

Upvotes: 1

Vvvvvv
Vvvvvv

Reputation: 174

It seems a physical work rather than a technical work:)

entry_list = df.loc[2, 'B'].split(' ')
df.loc[2, 'B'] = entry_list[0]
entry_list = entry_list[3:]
lines = []
for i in range(0, len(entry_list), 4):
    raw_line = entry_list[i:i+4]
    line = [item.replace('@', '') for item in raw_line[:-1]]
    line.append(raw_line[-1])
    lines.append(line)
df = pd.concat([df, pd.DataFrame(lines, columns=df.columns)]).reset_index(drop=True)

Upvotes: 1

Related Questions