Ann
Ann

Reputation: 95

Split needful rows in DataFrame

I have table:

                                   Name1 Name2 Name3
0                                    ABC   FGD   NNY
1  111S  PC  1T  Trees are always yellow   NaN   NaN
2                                      P   FGD   NNY
3                                    JJJ   FGD   NNY
4  111S  PC  1T  Trees are always yellow   NaN   NaN
5                                    ABC   FGD   NNY
6                                    UIK    GJ    DE

and i want to get this:

  Name1 Name2 Name3                    Name4
0   ABC   FGD   NNY                      NaN
1  111S    PC    1T  Trees are always yellow
2     P   FGD   NNY                      NaN
3   JJJ   FGD   NNY                      NaN
4  111S    PC    1T  Trees are always yellow
5   ABC   FGD   NNY                      NaN
6   UIK    GJ    DE                      NaN

I need to split only some rows and other rows should not change. I was able to determine the lines in which it is necessary to split the data:

if df[colname1].isnull:
    df_index=df[df[colname1].isnull()].index
    print(df_index)

Now need to separate values ​​in strings. I get somthing like that:

if df[colname1].isnull:
df_index=df[df[colname1].isnull()].index
print(df_index)

for i in df_index:
    print(i)
    df1=df[colname][i].split('     ')

df1 is string with needful information for me, but i don't know how put this info to DataFrame df in needful index. Could you help me with this.

Upvotes: 0

Views: 36

Answers (2)

ysearka
ysearka

Reputation: 3855

IIUC you have a double whitespace to delimite your columns, and single whitespace inside your sentences. You can use that to perform your split.

idx = df.loc[df.Name2.isnull()].index
df['Name4'] = np.nan
df.loc[idx] = df.loc[idx].Name1.str.split('  ',expand = True).values

    Name1   Name2   Name3   Name4
0   ABC     FGD     NNY     NaN
1   111S    PC      1T      Trees are always yellow
2   P       FGD     NNY     NaN
3   JJJ     FGD     NNY     NaN
4   111S    PC      1T      Trees are always yellow
5   ABC     FGD     NNY     NaN
6   UIK     GJ      DE      NaN

Upvotes: 0

BENY
BENY

Reputation: 323316

Using str.split with n

s=df.fillna('').apply('  '.join,1)
s.str.split('  ',n=3)
Out[189]: 
0                                [ABC, FGD, NNY]
1    [111S, PC, 1T, Trees are always yellow    ]
2                                  [P, FGD, NNY]
3                                [JJJ, FGD, NNY]
4    [111S, PC, 1T, Trees are always yellow    ]
5                                [ABC, FGD, NNY]
6                                  [UIK, GJ, DE]
dtype: object
pd.DataFrame(s.str.split('  ',n=3).tolist())
Out[190]: 
      0    1    2                            3
0   ABC  FGD  NNY                         None
1  111S   PC   1T  Trees are always yellow    
2     P  FGD  NNY                         None
3   JJJ  FGD  NNY                         None
4  111S   PC   1T  Trees are always yellow    
5   ABC  FGD  NNY                         None
6   UIK   GJ   DE                         None

Upvotes: 1

Related Questions