Reputation: 45
I have an pandas data frame which looks something like this
Index column1 column2 column3
0 3 \n9 2 \n89 8 \n56
1
2 8 6 4
3 4 \n9 12 \n12 32 \n5
4
5 78 68 56
I want to get rid of \n
and move the left over content to next like. Hence I want data frame something like this
Index column1 column2 column3
0 3 2 8
1 9 89 56
2 8 6 4
3 4 12 32
4 9 12 5
5 78 68 56
I have been able to remove \n
using the replace function
df1.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True)
But i have not been able to move the integer value of 9,89,56 to next line. Is it possible
Sample data:
{'column1': {0: '3 \\n9', 1: '', 2: 8, 3: '4 \\n9', 4: '', 5: 78},
'column2': {0: '2 \\n89', 1: '', 2: 6, 3: '12 \\n12', 4: '', 5: 68},
'column3': {0: '8 \\n56', 1: '', 2: 4, 3: '32 \\n5', 4: '', 5: 56}}
Upvotes: 2
Views: 356
Reputation: 4021
A one-liner using extractall
and apply
df1[['column1', 'column2', 'column3']] = \
df1[['column1', 'column2', 'column3']].apply(
lambda x: x.astype(str).str.extractall(r'([0-9]+)')\
.reset_index(drop=True)[0])
Upvotes: 0
Reputation: 71689
Use:
def expand(col):
return (
col.astype(str)
.replace('', np.nan).dropna().str.split(r"\s\\n")
.explode().reset_index(drop=True)
)
df[["column1", "column2", "column3"]] = df[[
"column1", "column2", "column3"]].apply(expand)
print(df)
This prints:
column1 column2 column3
0 3 2 8
1 9 89 56
2 8 6 4
3 4 12 32
4 9 12 5
5 78 68 56
Upvotes: 1
Reputation: 22493
One way is to define a function to flatten the columns:
from itertools import chain
def flatten(col):
return list(chain.from_iterable([i for i in col.str.split(r" \\n") if i]))
df[["column2","column3"]] = df[["column2","column3"]].apply(flatten)
print (df)
Index column1 column2 column3
0 0 3 2 8
1 1 7 89 56
2 2 8 6 4
EDIT: With the new sample data, here is an updated method:
def flatten(col):
return [i for i in chain.from_iterable(col.str.split(r" \n")) if i]
print (df.astype(str).apply(flatten))
column1 column2 column3
0 3 2 8
1 9 89 56
2 8 6 4
3 4 12 32
4 9 12 5
5 78 68 56
Upvotes: 3