Reputation: 2411
My pandas dataframe somehow got messed up. There are 2 columns in it that were supposed to contain lists, but now they contain strings of lists:
id. array
72 [ 2.2545414 -0.8302277 -9.557333 1.944972...
73 [ 3.0519443 1.2425094 -1.7121094 0.394222...
74 [ 2.9175313 1.0301533 -1.0083416 1.545938...
77 [-8.521629 3.2176793 2.5869853 1.399137...
id. names_arrays
72 ['T恤', '外套', '夹克', '衬衣', '领带', '衬衫', '围巾', '粉色...
73 ['济科', '外画', '段萍', '泰舍', '萎缩性', '祝丹妮', '大京', '...
74 ['秀场', '时装周', '时装秀', '舞台', '红毯', '时装设计', '复古风'...
You can't see it on the dataframe itself, but when I print:
np.array(df['array'][:1])[0]
I get
'[ 2.2545414 -0.8302277 -9.557333 1.9449722 3.7186048 5.790459\n 0.07255215 1.3358237 -2.9177604 4.03371 -1.4177471 -1.2400303\n 2.5485678 1.0194561 0.14744097 -1.0286134 2.1207867 -1.6046501\n 3.640595 11.30236 0.98157316 -4.8968134 -0.80825585 -2.9547403\n 8.363517 -0.7563907 0.590438 0.14872111 0.28678164 -4.1656523\n 0.21350707 2.7396295 -0.86256826 -3.0678177 -2.2119153 -3.3205476\n 1.7437696 -3.5955458 -3.811455 -2.4635699 2.3464768 3.774634\n]'
And the other column:
np.array(df['names_arrays'][:1])[0]
>>> "['T恤', '外套', '夹克', '衬衣', '领带', '衬衫', '围巾', '粉色', '纽扣', '球鞋']"
I found this to be useful for the names_arrays
column
literal_eval(np.array(df['names_arrays'][:1])[0])
>>> ['T恤', '外套', '夹克', '衬衣', '领带', '衬衫', '围巾', '粉色', '纽扣', '球鞋']
But 1. I'm not sure how to do it for the entire dataframe (rather than a single row)
and 2. this doesn't work for the column array
as it doesn't have commas in between the numbers, and also there are \n
in between sometimes
Upvotes: 0
Views: 141
Reputation: 9619
You can use applymap
on a custom function:
import pandas as pd
data = [('[ 2.2545414 -0.8302277 -9.557333 1.944972]', "['T恤', '外套', '夹克', '衬衣', '领带', '衬衫', '围巾', '粉色']"), ('[ 3.0519443 1.2425094 -1.7121094 0.394222]', "['济科', '外画', '段萍', '泰舍', '萎缩性', '祝丹妮', '大京']")]
df = pd.DataFrame(data, columns=['array', 'names_arrays'])
def fix_lists(text):
return text.replace('[', '').replace(']', '').replace(',', ' ').replace("'", '').split()
df = df.applymap(fix_lists)
df['array'][0][0]
will return 2.2545414, and df['names_arrays'][0][0]
T恤.
Upvotes: 1