Read multiple files in python and combine filenames and content into a dataframe

Question

I have the following lists in python created by reading files

files_list = ["A", "B", "C", "D"]

The contents of the files are character vectors as follows

A = ["A1"]
B = ["A2", "B1"]
C = ["A3", "B3", "C3", "C3"]
D = []

I would like to create the following dataframe

Col1   Col2
A      A1
B      A2, B1
C      A3, B3, C3
D

The filenames should be rendered as one column and the second column should contain the content of the files as a single line.

I tried the following code using a for loop. Note that this is a toy dataset and my dataset is a bit larger

import pandas as pd


df3 = pd.DataFrame()
for i in list_name:
    for j in i:
        df3["Col1"] = j
        df3["Col2"] = i

How do i accomplish the same using the for loop I request someone to take a look. The df3 object i generated was empty

Adirio · Accepted Answer

import pandas as pd


files_list = ["A", "B", "C", "D"]
files_cont = [
    ["A1"],
    ["A2", "B1"],
    ["A3", "B3", "C3", "C3"],
    [],
]

df3 = pd.DataFrame({"contents": list(map(sorted, map(set, files_cont)))}, index=files_list)
print(df3)

       contents
A          [A1]
B      [A2, B1]
C  [A3, B3, C3]
D            []

We create a new pd.DataFrame using a dict so that the key is used for the column name (I used "contents" but choose whatever you feel like) and providing the index keyword argument to specify the rows.

As the question removed duplicates in the list, each content list is passed first to the set function to eliminate duplicated elements, then to the sorted function to get back a list with sorted elements. If you dont need that just use {"contents": files_cont} instead.

Read multiple files in python and combine filenames and content into a dataframe

Answers (2)

Related Questions