stephen.m
stephen.m

Reputation: 149

From list of list of dictionaries to Pandas DataFrame

I've the following list format (see code snippet) and I want to create a DataFrame to achieve the desired result.

Desired result:8.00\n4.84\n1.416 2.56\n3.104\n3.09 1.184\n7.50\n15.00

```
data = [[ {'C': 8, 'G': 1, 'T': 1},
          {'C': 4.84, 'G': 1, 'T': 2},
          {'C': 1.416, 'G': 1, 'T': 3}],
         [{'C': 2.56, 'G': 1, 'T': 1},
          {'C': 3.104, 'G': 1, 'T': 2},
          {'C': 3.09, 'G': 1, 'T': 3}],
         [{'C': 1.184, 'G': 1, 'T': 1},
          {'C': 7.5, 'G': 1, 'T': 2},
          {'C': 15, 'G': 1, 'T': 3}]]
```

Upvotes: 0

Views: 148

Answers (3)

use a generator to create your groupings of three. I don't handle if the row count is not an interval of 3

 data = [[ {'C': 8, 'G': 1, 'T': 1},
      {'C': 4.84, 'G': 1, 'T': 2},
      {'C': 1.416, 'G': 1, 'T': 3}],
     [{'C': 2.56, 'G': 1, 'T': 1},
      {'C': 3.104, 'G': 1, 'T': 2},
      {'C': 3.09, 'G': 1, 'T': 3}],
     [{'C': 1.184, 'G': 1, 'T': 1},
      {'C': 7.5, 'G': 1, 'T': 2},
      {'C': 15, 'G': 1, 'T': 3}]]

 C=[]
 G=[]
 T=[]
 for row in data:
    for item in row:
        C.append(item['C'])
        G.append(item['G'])
        T.append(item['T'])

df=pd.DataFrame({'C':C,'G':G,'T':T})
print(df)

        C  G  T
0   8.000  1  1
1   4.840  1  2
2   1.416  1  3
3   2.560  1  1
4   3.104  1  2
5   3.090  1  3
6   1.184  1  1
7   7.500  1  2
8  15.000  1  3


mylist=df['C']

result=(str(item) for item in mylist)

for i in range(0,len(mylist),3):
    output=r"\n".join([next(result),next(result),next(result)])
    print(output)

8.0\n4.84\n1.416
2.56\n3.104\n3.09
1.184\n7.5\n15.0

Upvotes: 0

Joe Ferndz
Joe Ferndz

Reputation: 8508

Looking at your data, it looks like you have a list of list of dictionaries. So it just need to be flattened and loaded as a dataframe. Once you have the data in the dataframe, you need to convert the float into 3 decimal place string format, join the list of values and print.

Here's how I will do it:

  • Step 1: Flatten the data to a normal dictionary with key:value pair
  • Step 2: Load the key:value pair dictionary into a dataframe

Steps 1 & 2 are accomplished using this list comprehension + DataFrame creation step

df = pd.DataFrame([k for klist in data for k in klist])
  • Step 3: Convert the C column into a string format with 3 decimal places
  • Step 4: Concatenate the list as a string using .join() while adding '\n' as separator

Steps 3 & 4 are accomplished using this single line map and join function.

c_list = '\n'.join(df.C.map('{:,.3f}'.format).tolist())
  • Step 5: print the data in raw format to get the \n as well.

Step 5 is just to print and is another line. I am using repr to give you the \n data on the same line.

print (repr(c_list))

You can do it as follows:

data = [[ {'C': 8, 'G': 1, 'T': 1},
          {'C': 4.84, 'G': 1, 'T': 2},
          {'C': 1.416, 'G': 1, 'T': 3}],
         [{'C': 2.56, 'G': 1, 'T': 1},
          {'C': 3.104, 'G': 1, 'T': 2},
          {'C': 3.09, 'G': 1, 'T': 3}],
         [{'C': 1.184, 'G': 1, 'T': 1},
          {'C': 7.5, 'G': 1, 'T': 2},
          {'C': 15, 'G': 1, 'T': 3}]]
import pandas as pd
df = pd.DataFrame([k for klist in data for k in klist])
c_list = '\n'.join(df.C.map('{:,.3f}'.format).tolist())
print (repr(c_list))

The output of this will be:

'8.000\n4.840\n1.416\n2.560\n3.104\n3.090\n1.184\n7.500\n15.000'

To print 3 items one each line, you can do the following:

for i in range(0,len(c_list),3):
    print(repr('\n'.join(c_list[i:i+3])))

or you can try to print it as:

for i in range(0,len(c_list),3):
    print(r'\n'.join(c_list[i:i+3]))

The output will be:

'8.000\n4.840\n1.416'
'2.560\n3.104\n3.090'
'1.184\n7.500\n15.000'

I assume you are asking for this.

I added an extra line to the input dictionary {'C': 12.5, 'G': 2, 'T': 8}

The output is as follows:

'8.000\n4.840\n1.416'
'2.560\n3.104\n3.090'
'1.184\n7.500\n15.000'
'12.500'

Upvotes: 1

Rob Raymond
Rob Raymond

Reputation: 31146

  1. Convert it to a numpy array.
  2. reshape() to effectively 1D
  3. json_normalize() to extract the embedded dict to columns
data = [[ {'C': 8, 'G': 1, 'T': 1},
          {'C': 4.84, 'G': 1, 'T': 2},
          {'C': 1.416, 'G': 1, 'T': 3}],
         [{'C': 2.56, 'G': 1, 'T': 1},
          {'C': 3.104, 'G': 1, 'T': 2},
          {'C': 3.09, 'G': 1, 'T': 3}],
         [{'C': 1.184, 'G': 1, 'T': 1},
          {'C': 7.5, 'G': 1, 'T': 2},
          {'C': 15, 'G': 1, 'T': 3}]]

a = np.array(data)
df = pd.json_normalize(a.reshape(1, a.shape[0]*a.shape[1])[0])

output

     C  G  T
 8.000  1  1
 4.840  1  2
 1.416  1  3
 2.560  1  1
 3.104  1  2
 3.090  1  3
 1.184  1  1
 7.500  1  2
15.000  1  3

Upvotes: 2

Related Questions