Reputation: 2929
Got a dataframe df with a column "Id"
Id
0 -KkJz3CoJNM
1 08QMXEQbEWw
2 0ANuuVrIWJw
3 0pPU8CtwXTo
4 1-wYH2LEcmk
I need to convert column "Id" into a set() but
set_id = set(df["Id"])
print(set_id)
returns
{'Id'}
instead of a set() of the strings from column "Id"?
Upvotes: 2
Views: 7020
Reputation: 862511
For me working correctly if exist only one id
column:
set_id = set(df["Id"])
print(set_id)
{'1-wYH2LEcmk', '08QMXEQbEWw', '0pPU8CtwXTo', '0ANuuVrIWJw', '-KkJz3CoJNM'}
But if there is more columns names id
then df['id']
return DataFrame
, so set(df["Id"])
return unique columns names:
#test for 2 columns with sample data
df = pd.concat([df, df], axis=1)
print (df["Id"])
Id Id
0 -KkJz3CoJNM -KkJz3CoJNM
1 08QMXEQbEWw 08QMXEQbEWw
2 0ANuuVrIWJw 0ANuuVrIWJw
3 0pPU8CtwXTo 0pPU8CtwXTo
4 1-wYH2LEcmk 1-wYH2LEcmk
set_id = set(df["Id"])
print(set_id)
{'Id'}
Because:
L = list(df["Id"])
print(L)
['Id', 'Id']
working same like
L = list(df["Id"].columns)
print(L)
['Id', 'Id']
and similar for sets:
set_id = set(df["Id"].columns)
print(set_id)
{'Id'}
Possible solution for deduplicate columns:
c = df.columns.to_series()
df.columns += c.groupby(c).cumcount().astype(str).radd('.').replace('.0','')
print (df)
Id Id.1
0 -KkJz3CoJNM -KkJz3CoJNM
1 08QMXEQbEWw 08QMXEQbEWw
2 0ANuuVrIWJw 0ANuuVrIWJw
3 0pPU8CtwXTo 0pPU8CtwXTo
4 1-wYH2LEcmk 1-wYH2LEcmk
Or if always same values remove duplicated columns:
df = df.loc[:, ~df.columns.duplicated()]
print (df)
Id
0 -KkJz3CoJNM
1 08QMXEQbEWw
2 0ANuuVrIWJw
3 0pPU8CtwXTo
4 1-wYH2LEcmk
Upvotes: 5