Python - From Pandas to Sparse Output Format

Question

Is there a better way to do what the code below does in a (slow!) loop?

Using an input DataFrame, I want to convert it to a list of products each user has consumed. But this list will be up to the millions long and this seems quite inefficient (unless I use cython). Any ideas how to make this more python-happy? Thanks!

a = pd.DataFrame({'user_id':['a', 'a', 'b', 'c', 'c', 'c'], 'prod_id':['p1', 'p2', 'p1', 'p2', 'p3', 'p7']})

print "Input Dataframe:
", a
print '
Desired Output:'

# Build desired output:
uniqIDs = a.user_id.unique()

for id in uniqIDs:

    prod_list = list(a[a.user_id == id].prod_id.values)        

    s = id + '	'
    for x in prod_list:
        s += x + '	'

    print s # This will get saved to a TAB DELIMITED file

Gives this output (which is exactly what I desire):

Input Dataframe:
  prod_id user_id
0      p1       a
1      p2       a
2      p1       b
3      p2       c
4      p3       c
5      p7       c

Desired Output:
a   p1  p2  
b   p1  
c   p2  p3  p7

Python - From Pandas to Sparse Output Format

Answers (1)

Related Questions