aloha
aloha

Reputation: 4774

group data in dictionaries

My data looks something like this:

object   weight
table     2.3
chair     1.2
chair     1.0
table     1.5
drawer    1.8
table     1.7

I would like to group my data according to the different type of objects. Plus, I would like to know how many objects I have and there weight.

For example, my final data should look like this:

object     counter     weight
table         3        2.3, 1.5, 1.7
chair         2        1.2, 1.0
drawer        1        1.8

Here is my attempt:

data = pd.read_csv('data.dat', sep = '\s+')

grouped_data = {'object':[],'counter':[], 'weight':[]}
objects = ['table', 'chair', 'drawer']

for item in objects:
    counter = 0
    weight = []
    grouped_objects['object'].append(item)
    for i in range(len(data)):
        if item == data['name'][i]:
            counter += 1
            grouped_data['weight'].append(data['weight'])
            grouped_data['counter'].append(counter)

It is not giving me the desired output. Any suggestions?

Upvotes: 1

Views: 128

Answers (6)

Alexander
Alexander

Reputation: 109546

You can use groupby with a dictionary comprehension.

>>> pd.DataFrame({col: [len(group), group.loc[:, 'weight'].tolist()] 
                  for col, group in df.groupby('object')}).T.rename(columns={0: 'count', 
                                                                             1: 'weights'})

       count          weights
chair      2       [1.2, 1.0]
drawer     1            [1.8]
table      3  [2.3, 1.5, 1.7]

Upvotes: 0

kmaork
kmaork

Reputation: 6012

At a glance I can spot a few potential mistakes:

  • len(data) will not give you the amount of rows in your data, but the amount of columns. Try using data.shape[1] instead.
  • You are appending to the weight list weights from all kinds, instead of making a list for each kind
  • You're not appending one weight but the whole column of weights each time

I would do it in a different way anyway, but also using a dictionary. It makes sense that the key to each entry in the dictionary will be an object's type, and the value would be the data you want to store about it. For example, {'table': {'counter':3, weight: [2.3, 1.5, 1.7]}}

Now you only have to loop through your data, fill the dictionary, and then print it in whatever format you want. This method should also be more efficient (n instead of n^2):

data = pd.read_csv('data.dat', sep = '\s+')

# creating initial empty dictionary
info = {
    'table': {'counter':0,'weight':[]}
    'chair': {'counter':0,'weight':[]}
    'drawer': {'counter':0,'weight':[]}
} # you can also create it using a loop


# filling dictionary with values
for i in range(data.shape[1]):
    cur_dict = info[data['name'][i]]
    cur_dict['counter'] += 1
    cur_dict['weight'].append(data['weight'][i])

# printing in desired format
print 'object\tcounter\tweight'
for key in info:
    cur = info[key]
    print key + '\t' + str(cur['counter']) + '\t' + repr(cur['weight'])

Hope it works for you :)

Upvotes: 0

Garrett R
Garrett R

Reputation: 2662

I think what you actually want is a defaultdict (it's a class from the collections library) whose default function returns an empty list. Then, the len of that list will give you the counter. For example:

from collections import defaultdict
grouped_data = defaultdict(list)

for i in range(data):
    name, weight = data['name'][i], data['weight'][i]
    grouped_data[name].append(weight)

print len(grouped_data['table']) #should return count of weights

Upvotes: 2

Daniel Diekmeier
Daniel Diekmeier

Reputation: 3434

You can just get the count by using len(), and you can directly iterate over your data using for item in data, instead of getting an index with range:

data = [
    { 'name': 'table', 'weight': 2.3 },
    { 'name': 'chair', 'weight': 1.2 },
    { 'name': 'chair', 'weight': 1.0 },
    { 'name': 'table', 'weight': 1.5 },
    { 'name': 'drawer', 'weight': 1.8 },
    { 'name': 'table', 'weight': 1.7 }
]

grouped_data = {'table': [], 'chair': [], 'drawer': []}

for item in data:
    grouped_data[item['name']].append(item['weight'])

print(grouped_data)
print(len(grouped_data['table']))

>>> {'table': [2.3, 1.5, 1.7], 'chair': [1.2, 1.0], 'drawer': [1.8]}
>>> 3

Upvotes: 0

user2285236
user2285236

Reputation:

With agg:

df.groupby("object")["weight"].agg({"counter": "count", "weight": lambda x: ", ".join(x.astype(str))})
Out[57]: 
        counter         weight
object                        
chair         2       1.2, 1.0
drawer        1            1.8
table         3  2.3, 1.5, 1.7

Upvotes: 4

EdChum
EdChum

Reputation: 394041

You can do it this way by using agg and passing a list of functions:

In [32]:
def counter(x):
    return len(x)
​
def weight(x):
    return ', '.join(x)
​
df.groupby('object')['weight'].agg([weight, counter]).reset_index()

Out[32]:
   object         weight  counter
0   chair       1.2, 1.0        2
1  drawer            1.8        1
2   table  2.3, 1.5, 1.7        3

This presumes that the weight column dtype is already str if not then convert by doing df['weight'] = df['weight'].astype(str)

Upvotes: 1

Related Questions