Reputation: 47
I am trying to preprocess one of my columns in my Data frame. The issue is that I have [[ content1] , [content2], [content3]] in the relations column. I want to remove the Brackets
i have tried this following:
df['value'] = df['value'].str[0]
the output that i get is [content 1]
df
print df
id value
1 [[str1],[str2],[str3]]
2 [[str4],[str5]]
3 [[str1]]
4 [[str8]]
5 [[str9]]
6 [[str4]]
the expected output should be like
id value
1 str1,str2,str3
2 str4,str5
3 str1
4 str8
5 str9
6 str4
Upvotes: 1
Views: 520
Reputation: 66
You can use useful regex python package re
.
This is the solution.
import pandas as pd
import re
make the test data
data = [
[1, '[[str1],[str2],[str3]]'],
[2, '[[str4],[str5]]'],
[3, '[[str1]]'],
[4, '[[str8]]'],
[5, '[[str9]]'],
[6, '[[str4]]']
]
conver data to Dataframe
df = pd.DataFrame(data, columns = ['id', 'value'])
print(df)
remove '[', ']' from the 'value' column
df['value']=df.apply(lambda x: re.sub("[\[\]]", "", x['value']),axis=1)
print(df)
Upvotes: 0
Reputation: 8816
As I could see, your data and sampling the same:
df = pd.DataFrame({'id':[1,2,3,4,5,6], 'value':['[[str1],[str2],[str3]]', '[[str4],[str5]]', '[[str1]]', '[[str8]]', '[[str9]]', '[[str4]]']})
print(df)
id value
0 1 [[str1],[str2],[str3]]
1 2 [[str4],[str5]]
2 3 [[str1]]
3 4 [[str8]]
4 5 [[str9]]
5 6 [[str4]]
df['value'] = df['value'].str.replace('[', '').astype(str).str.replace(']', '')
print(df)
id value
0 1 str1,str2,str3
1 2 str4,str5
2 3 str1
3 4 str8
4 5 str9
5 6 str4
Note: as the error code says AttributeError: Can only use .str accessor with string values
which means it's not treating it as str
hence you may cast it to str
by astype(str)
and then do the replace operation.
Upvotes: 0
Reputation: 260975
It looks like you have lists of lists. You can try to unnest and join:
df['value'] = df['value'].apply(lambda x: ','.join([e for l in x for e in l]))
Or:
from itertools import chain
df['value'] = df['value'].apply(lambda x: ','.join(chain.from_iterable(x)))
NB. If you get an error, please provide it and the type of the column (df.dtypes
)
Upvotes: 1