Reputation: 1265
I have the following lists in python created by reading files
files_list = ["A", "B", "C", "D"]
The contents of the files are character vectors as follows
A = ["A1"]
B = ["A2", "B1"]
C = ["A3", "B3", "C3", "C3"]
D = []
I would like to create the following dataframe
Col1 Col2
A A1
B A2, B1
C A3, B3, C3
D
The filenames should be rendered as one column and the second column should contain the content of the files as a single line.
I tried the following code using a for loop. Note that this is a toy dataset and my dataset is a bit larger
import pandas as pd
df3 = pd.DataFrame()
for i in list_name:
for j in i:
df3["Col1"] = j
df3["Col2"] = i
How do i accomplish the same using the for loop I request someone to take a look. The df3 object i generated was empty
Upvotes: 0
Views: 599
Reputation: 518
Suppose your files are CSVs you can do the following to use the for loop:
import glob
import pandas as pd
directory = "C:/your/path/to/all/files/*.csv"
df3 = pd.DataFrame(columns=["col", "contents"])
for file in glob.glob(directory):
col = file.split(sep="\\")[1].split(".")[0]
try:
temp = pd.read_csv(file, header=None).values.flatten()
df3 = df3.append({"col": col, "contents": temp}, ignore_index=True)
except:
df3 = df3.append({"col": col, "contents": None}, ignore_index=True)
you get the following DataFrame:
col contents
0 A [A1]
1 B [A2, B1]
2 C [A3, B3, C3]
3 D None
Upvotes: 2
Reputation: 5286
import pandas as pd
files_list = ["A", "B", "C", "D"]
files_cont = [
["A1"],
["A2", "B1"],
["A3", "B3", "C3", "C3"],
[],
]
df3 = pd.DataFrame({"contents": list(map(sorted, map(set, files_cont)))}, index=files_list)
print(df3)
contents
A [A1]
B [A2, B1]
C [A3, B3, C3]
D []
We create a new pd.DataFrame
using a dict so that the key is used for the column name (I used "contents"
but choose whatever you feel like) and providing the index
keyword argument to specify the rows.
As the question removed duplicates in the list, each content list is passed first to the set
function to eliminate duplicated elements, then to the sorted
function to get back a list with sorted elements. If you dont need that just use {"contents": files_cont}
instead.
Upvotes: 3