Reputation: 689
I have csv in below format
Used CPU Used Memory Hard CPU Hard Memory
1 4Gi 50 24Gi
0 0 0 0
2 4Gi 4 8Gi
2 4Gi 4 8Gi
0 0 100m 128Mi
51550m 39528Mi 56 47Gi
They are string values. In this table 51550m means millicores which I need to convert to cores. 39528Mi is Mebibyte which I need to convert to gibibyte(or so). I want to know how I can read each value column wise and if I find m
(like in 51550m), convert it to core. And then convert all the values of the column into integer so that I can add them all.
I want to use pandas but I am very new to it. I know I can try df["col_name"].astype("int")
to convert to integer but I also need to interpret millicore values to convert them.
Any help is much appreciated.
Expected Output: All values must be float. I got below from converstions
100 millicore = 1/10 cores
1 Mebibyte = 0.00104858 GB
Used CPU Used Memory Hard CPU Hard Memory
1 4.296 50 25.7698
2 4.296 4 8.592
2 4.296 4 8.592
0 0 .1 0.134218
51.550 41.448112 56 50.4659
Upvotes: 5
Views: 6198
Reputation: 13349
You can do something like this.
UPDATED:
df = pd.read_csv("your_csv_file")
'''df = pd.DataFrame({'Used CPU':['1','0','2','2','0','51550m'], \
'Used Memory':['4Gi','0','4Gi','4Gi','0', '39528Mi'], \
'Hard CPU':['50','0','4','4','100m','56'], \
'Hard Memory':['24Gi','0','8Gi', '8Gi', '128Mi', '47Gi']})'''
units = {'m':0.001,'Mi':0.00104858,'Gi':1.0737425}
def conversion(x):
for key in units.keys():
if key in str(x):
x = x.split(key)[0]
x = (int(x)*units[key])
return x
return str(x)
df = df.applymap(conversion)
df = df.apply(lambda x: x.astype(np.float64), axis=1)
print(df)
INPUT:
Hard CPU Hard Memory Used CPU Used Memory
0 50 24Gi 1 4Gi
1 0 0 0 0
2 4 8Gi 2 4Gi
3 4 8Gi 2 4Gi
4 100m 128Mi 0 0
5 56 47Gi 51550m 39528Mi
OUTPUT:
Hard CPU Hard Memory Used CPU Used Memory
0 50.0 25.76980 1.00 4.29497
1 0.0 0.000000 0.00 0.00000
2 4.0 8.589940 2.00 4.29497
3 4.0 8.589940 2.00 4.29497
4 0.1 0.134218 0.00 0.00000
5 56.0 50.465898 51.55 41.44827
They are in Float64. Now you can use df['Hard Memory'] + df['Used Memory']
Upvotes: 4
Reputation: 48
making custom functions is very easy in pandas. Maybe you can try these :
# import
import pandas as pd
# reading file
df = pd.read_csv("PATH_TO_CSV_FILE")
def func_CPU(x):
""" function for CPU related columns"""
if x[-1] == "m":
return float(x[:-1])/1000
else: return x
def func_Memory(x):
""" function for Memory related columns"""
if x[-2:] == "Gi":
return float(x[:-2]) * 1024 *0.00104858
elif x[-2:] == "Mi":
return float(x[:-2]) * 0.00104858
else: return x
df["Used_CPU"] = df["Used_CPU"].apply(func_CPU)
df["Used_Memory"] = df["Used_Memory"].apply(func_Memory)
df["Hard_CPU"] = df["Hard_CPU"].apply(func_CPU)
df["Hard_Memory"] = df["Hard_Memory"].apply(func_Memory)
print(df)
Upvotes: 2
Reputation: 1607
I didnt find any simple way, here is one dirty way to do it Basically, your columns contain different string (Gi and Mi) and needs separate calculation. So, you can do something like this. Also, I am missing here calculation for Hard CPU column, but the idea is same and basically you can the same pattern for it ( Like Used CPU column) .
df['Used CPU'] = np.where(df['Used CPU'].str.contains('m'),
pd.to_numeric(df['Used CPU'].map(lambda x:str(x)[:-1])) /1000,
df['Used CPU'])
df['Used Memory'] = np.where(df['Used Memory'].str.contains('Mi'),
pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * 0.00104858,
df['Used Memory'])
df['Hard Memory'] = np.where(df['Hard Memory'].str.contains('Gi'),
pd.to_numeric(df['Hard Memory'].map(lambda x:str(x)[:-2])) *(use math conversion here),
df['Hard Memory'])
Now, for the second column, there are Gi values too, so you can repeat the same like this
df['Used Memory'] = np.where(df['Used Memory'].str.contains('Gi'),
pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * (do math conversion here),
df['Used Memory'])
As each items in a column needs different math conversions, if such string exists. Simple possible solution I can think is this. Sorry about that
Upvotes: 1