Reputation: 45

How to remove \n from data frame and move the data to new row

I have an pandas data frame which looks something like this

Index   column1   column2   column3
  0        3 \n9     2 \n89     8 \n56
  1        
  2        8         6          4
  3        4 \n9     12 \n12    32 \n5
  4                
  5         78       68           56

I want to get rid of \n and move the left over content to next like. Hence I want data frame something like this

Index   column1   column2   column3
  0        3        2          8
  1        9        89         56 
  2        8        6          4
  3        4        12         32
  4        9        12         5
  5        78       68         56

I have been able to remove \n using the replace function

df1.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True)

But i have not been able to move the integer value of 9,89,56 to next line. Is it possible

Sample data:

{'column1': {0: '3 \\n9', 1: '', 2: 8, 3: '4 \\n9', 4: '', 5: 78},
 'column2': {0: '2 \\n89', 1: '', 2: 6, 3: '12 \\n12', 4: '', 5: 68}, 
 'column3': {0: '8 \\n56', 1: '', 2: 4, 3: '32 \\n5', 4: '', 5: 56}}

Upvotes: 2

Answers (3)

jcaliz

Reputation: 4021

A one-liner using extractall and apply

df1[['column1', 'column2', 'column3']] = \
    df1[['column1', 'column2', 'column3']].apply(
        lambda x: x.astype(str).str.extractall(r'([0-9]+)')\
             .reset_index(drop=True)[0])

Upvotes: 0

Shubham Sharma

Reputation: 71689

Use:

def expand(col):
    return (
        col.astype(str)
        .replace('', np.nan).dropna().str.split(r"\s\\n")
        .explode().reset_index(drop=True)
    )


df[["column1", "column2", "column3"]] = df[[
    "column1", "column2", "column3"]].apply(expand)
print(df)

This prints:

  column1 column2 column3
0       3       2       8
1       9      89      56
2       8       6       4
3       4      12      32
4       9      12       5
5      78      68      56

Upvotes: 1

Henry Yik

Reputation: 22493

One way is to define a function to flatten the columns:

from itertools import chain

def flatten(col):
    return list(chain.from_iterable([i for i in col.str.split(r" \\n") if i]))

df[["column2","column3"]] = df[["column2","column3"]].apply(flatten)

print (df)

   Index  column1 column2 column3
0      0        3       2       8
1      1        7      89      56
2      2        8       6       4

EDIT: With the new sample data, here is an updated method:

def flatten(col):
    return [i for i in chain.from_iterable(col.str.split(r" \n")) if i]

print (df.astype(str).apply(flatten))

  column1 column2 column3
0       3       2       8
1       9      89      56
2       8       6       4
3       4      12      32
4       9      12       5
5      78      68      56

Upvotes: 3

How to remove \n from data frame and move the data to new row

Answers (3)

Related Questions