Reputation: 63994
I have the following data frame:
import pandas as pd
df = pd.DataFrame({
"ClusterID" : [1,2,2,1,3],
"Genes" : ['foo','qux','bar','cux','fii'],
})
Which looks like this:
ClusterID Genes
0 1 foo
1 2 qux
2 2 bar
3 1 cux
4 3 fii
What I want to do is to convert them into a dictionary of list:
{ '1': ['foo','cux'],
'2': ['qux','bar'],
'3': ['fii']}
How can I do that?
Upvotes: 3
Views: 4794
Reputation: 862541
You can use groupby
and apply
tolist
and then use Series.to_dict
:
import pandas as pd
df = pd.DataFrame({
"ClusterID" : [1,2,2,1,3],
"Genes" : ['foo','qux','bar','cux','fii'],
})
print df
ClusterID Genes
0 1 foo
1 2 qux
2 2 bar
3 1 cux
4 3 fii
s = df.groupby('ClusterID')['Genes'].apply(lambda x: x.tolist())
print s
ClusterID
1 [foo, cux]
2 [qux, bar]
3 [fii]
Name: Genes, dtype: object
print s.to_dict()
{1: ['foo', 'cux'], 2: ['qux', 'bar'], 3: ['fii']}
Upvotes: 8
Reputation: 7590
dct = {x:df.Genes[df.ClusterID == x].tolist() for x in set(df.ClusterID)}
# dct == {1: ['foo','cux'], 2: ['qux','bar'], 3: ['fii']}
As your ClusterID column consists of integer values, your dictionary keys will be as well. If you want the keys to be strings as in your example, simply use the str
function as
dct = {str(x):df.Genes[df.ClusterID == x].tolist() for x in set(df.ClusterID)}
Here we are using a dictionary comprehension statement. The expression set(df.ClusterID)
will get us a set of the unique values in that column (we can use a set as the dictionary keys are unordered anyways). df.Genes[df.ClusterID == x]
will get us the values in the Genes column corresponding to the rows with the ClusterID values equal to x. Using tolist()
will cast the pandas.Series returned there to a list.
Thus this dictionary expression loops through each unique value in the ClusterID column, and stores the list of Genes values corresponding to that value as a list in a dictionary under that key.
Upvotes: 1