Vega
Vega

Reputation: 2929

Convert dataframe column with type "object" to a set()

Got a dataframe df with a column "Id"

     Id
0    -KkJz3CoJNM
1    08QMXEQbEWw
2    0ANuuVrIWJw
3    0pPU8CtwXTo
4    1-wYH2LEcmk

I need to convert column "Id" into a set() but

set_id = set(df["Id"])
print(set_id)

returns

{'Id'}

instead of a set() of the strings from column "Id"?

Upvotes: 2

Views: 7020

Answers (1)

jezrael
jezrael

Reputation: 862511

For me working correctly if exist only one id column:

set_id = set(df["Id"])
print(set_id)
{'1-wYH2LEcmk', '08QMXEQbEWw', '0pPU8CtwXTo', '0ANuuVrIWJw', '-KkJz3CoJNM'}

But if there is more columns names id then df['id'] return DataFrame, so set(df["Id"]) return unique columns names:

#test for 2 columns with sample data
df = pd.concat([df, df], axis=1)
print (df["Id"])
            Id           Id
0  -KkJz3CoJNM  -KkJz3CoJNM
1  08QMXEQbEWw  08QMXEQbEWw
2  0ANuuVrIWJw  0ANuuVrIWJw
3  0pPU8CtwXTo  0pPU8CtwXTo
4  1-wYH2LEcmk  1-wYH2LEcmk

set_id = set(df["Id"])
print(set_id)
{'Id'}

Because:

L = list(df["Id"])
print(L)
['Id', 'Id']

working same like

L = list(df["Id"].columns)
print(L)
['Id', 'Id']

and similar for sets:

set_id = set(df["Id"].columns)
print(set_id)
{'Id'}

Possible solution for deduplicate columns:

c = df.columns.to_series()

df.columns += c.groupby(c).cumcount().astype(str).radd('.').replace('.0','')
print (df)
            Id         Id.1
0  -KkJz3CoJNM  -KkJz3CoJNM
1  08QMXEQbEWw  08QMXEQbEWw
2  0ANuuVrIWJw  0ANuuVrIWJw
3  0pPU8CtwXTo  0pPU8CtwXTo
4  1-wYH2LEcmk  1-wYH2LEcmk

Or if always same values remove duplicated columns:

df = df.loc[:, ~df.columns.duplicated()]
print (df)
            Id
0  -KkJz3CoJNM
1  08QMXEQbEWw
2  0ANuuVrIWJw
3  0pPU8CtwXTo
4  1-wYH2LEcmk

Upvotes: 5

Related Questions