Reputation: 1345
I am reading a CSV file into a pandas dataframe using Python. I want to read in a list of text files into a new column of the dataframe.
The original CSV file I'm reading from looks like this:
Name,PrivateIP
bastion001,10.238.2.166
logicmonitor001,10.238.2.52
logicmonitor002,45.21.2.13
The original dataframe looks like this.
hosts_list = dst = os.path.join('..', '..', 'source_files', 'aws_hosts_list', 'aws_hosts_list.csv')
fields = ["Name", "PrivateIP"]
orig_df = pd.read_csv(hosts_list, skipinitialspace=True, usecols=fields)
print(f"Orig DF: {orig_df}")
Orig DF:
Name PrivateIP
0 bastion001 10.238.2.166
1 logicmonitor001 10.238.2.52
2 logicmonitor002 45.21.2.13
The text directory has a bunch of text files in it with memory readings in each:
bastion001-memory.txt B-mmp-rabbitmq-core002-memory.txt logicmonitor002-memory.txt mmp-cassandra001-memory.txt company-division-rcsgw002-memory.txt
B-mmp-platsvc-core001-memory.txt haproxy001-memory.txt company-cassandra001-memory.txt mmp-cassandra002-memory.txt company-waepd001-memory.txt
B-mmp-platsvc-core002-memory.txt haproxy002-memory.txt company-cassandra002-memory.txt mmp-cassandra003-memory.txt company-waepd002-memory.txt
B-mmp-rabbitmq-core001-memory.txt logicmonitor001-memory.txt company-cassandra003-memory.txt company-division-rcsgw001-memory.txt company-waepd003-memory.txt
Each file looks similar to this:
cat haproxy001-memory.txt
7706172
I read each file into the existing dataframe.
rowcount == 0
text_path = '/home/tdun0002/stash/cloud_scripts/output_files/memory_stats/text/'
filelist = os.listdir(text_path)
for filename in filelist:
if rowcount == 0:
pass
else:
my_file = text_path + filename
print(f"Adding {filename} to DF")
try:
orig_df = pd.update(my_file)
print(f"Data Frame: {orif_df}")
++rowcount
except Exception as e:
print(f"An error has occurred: {e}")
But when I try to read the resulting dataframe again it has not been updated. I gave the new DF a new name for clarity.
result_df = orig_df
pd.options.display.max_rows
print(f"\nResult Data Frame:\n{result_df}\n")
Result Data Frame:
Name PrivateIP
0 bastion001 10.238.2.166
1 logicmonitor001 10.238.2.52
2 logicmonitor002 45.21.2.13
How can I create a new column called Memory
in the DF and add the contents of the text files to that column?
Upvotes: 1
Views: 770
Reputation: 91
Here's the code I hope would work. It's a bit clunky, but you'll get the idea. There are comments inside.
import pandas as pd
import os
from os import listdir
from os.path import isfile, join
# get all files in the directory
# i used os.getcwd() to get the current directory
# if your text files are in another dir, then write exact dir location
# this gets you all files in your text dir
onlyfiles = [f for f in listdir(os.getcwd()) if isfile(join(os.getcwd(), f))]
# convert it to series
memory_series = pd.Series(onlyfiles)
# an apply function to get just txt files
# others will be returned as None
def file_name_getter(x):
names = x.split(".", maxsplit=1)
if names[1] == "txt":
return names[0]
else:
return None
# apply the function and get a new series with name values
mem_list = memory_series.apply(lambda x: file_name_getter(x))
# now read first line of txt files
# and this is the function for it
def get_txt_data(x):
if x != None:
with open(f'{x}.txt') as f:
return int(f.readline().rstrip())
else:
return 0
# apply the function, get a new series with memory values
mem_val_list = mem_list.apply(lambda x: get_txt_data(x))
# create a df where our Name and Memory data are present
# cast Memory data as int
df = pd.DataFrame(mem_val_list, columns=["Memory"], dtype="int")
df["Name"] = mem_list
# get rid of -memory now
def name_normalizer(x):
if x is None:
return x
else:
return x.rsplit("-", maxsplit=1)[0]
# apply function
df["Name"] = df["Name"].apply(lambda x: name_normalizer(x))
# our sample orig_df
orig_df = pd.DataFrame([["algo_2", "10.10.10"], ["other", "20.20.20"]], columns=["Name", "PrivateIP"])
# merge using on, so if we miss data; that data wont cause any problem
# all matching names will get their memory values
final_df = orig_df.merge(df, on="Name")
edit: fixed Name
to be returned correctly. (xxx-memory to xxx)
Upvotes: 1