Heenashree Khandelwal
Heenashree Khandelwal

Reputation: 689

convert values of a column in pandas

I have csv in below format

Used CPU    Used Memory Hard CPU    Hard Memory
    1       4Gi         50          24Gi
    0       0           0           0
    2       4Gi         4           8Gi
    2       4Gi         4           8Gi
    0       0           100m        128Mi
    51550m  39528Mi     56          47Gi

They are string values. In this table 51550m means millicores which I need to convert to cores. 39528Mi is Mebibyte which I need to convert to gibibyte(or so). I want to know how I can read each value column wise and if I find m (like in 51550m), convert it to core. And then convert all the values of the column into integer so that I can add them all.

I want to use pandas but I am very new to it. I know I can try df["col_name"].astype("int") to convert to integer but I also need to interpret millicore values to convert them.

Any help is much appreciated.

Expected Output: All values must be float. I got below from converstions

100 millicore = 1/10 cores
1 Mebibyte = 0.00104858 GB


  Used CPU  Used Memory Hard CPU    Hard Memory
        1       4.296       50          25.7698 
        2       4.296       4           8.592
        2       4.296       4           8.592
        0       0           .1          0.134218 
        51.550  41.448112   56          50.4659

Upvotes: 5

Views: 6198

Answers (3)

Pygirl
Pygirl

Reputation: 13349

You can do something like this.

UPDATED:

df = pd.read_csv("your_csv_file")

'''df = pd.DataFrame({'Used CPU':['1','0','2','2','0','51550m'], \
                   'Used Memory':['4Gi','0','4Gi','4Gi','0', '39528Mi'], \
                   'Hard CPU':['50','0','4','4','100m','56'], \
                   'Hard Memory':['24Gi','0','8Gi', '8Gi', '128Mi', '47Gi']})'''

units = {'m':0.001,'Mi':0.00104858,'Gi':1.0737425}
def conversion(x):
    for key in units.keys():
        if key in str(x):
            x = x.split(key)[0]
            x = (int(x)*units[key])
            return x
    return str(x)

df = df.applymap(conversion)
df = df.apply(lambda x: x.astype(np.float64), axis=1)
print(df)

INPUT:

   Hard CPU  Hard Memory  Used CPU  Used Memory
0  50        24Gi         1         4Gi
1  0         0            0         0
2  4         8Gi          2         4Gi
3  4         8Gi          2         4Gi
4  100m      128Mi        0         0
5  56        47Gi         51550m    39528Mi

OUTPUT:

    Hard CPU  Hard Memory  Used CPU  Used Memory
0   50.0      25.76980     1.00      4.29497
1   0.0       0.000000     0.00      0.00000
2   4.0       8.589940     2.00      4.29497
3   4.0       8.589940     2.00      4.29497
4   0.1       0.134218     0.00      0.00000
5   56.0      50.465898    51.55     41.44827

They are in Float64. Now you can use df['Hard Memory'] + df['Used Memory']

Upvotes: 4

optimusprime
optimusprime

Reputation: 48

making custom functions is very easy in pandas. Maybe you can try these :

# import
import pandas as pd
# reading file
df = pd.read_csv("PATH_TO_CSV_FILE")

def func_CPU(x):
    """ function for CPU related columns"""
    if x[-1] == "m":
        return float(x[:-1])/1000
    else: return x

def func_Memory(x):
    """ function for Memory related columns"""
    if x[-2:] == "Gi":
        return float(x[:-2]) * 1024 *0.00104858
    elif x[-2:] == "Mi":
        return float(x[:-2]) * 0.00104858
    else: return x



df["Used_CPU"] = df["Used_CPU"].apply(func_CPU)
df["Used_Memory"] = df["Used_Memory"].apply(func_Memory)
df["Hard_CPU"] = df["Hard_CPU"].apply(func_CPU)
df["Hard_Memory"] = df["Hard_Memory"].apply(func_Memory)
print(df)

Upvotes: 2

user96564
user96564

Reputation: 1607

I didnt find any simple way, here is one dirty way to do it Basically, your columns contain different string (Gi and Mi) and needs separate calculation. So, you can do something like this. Also, I am missing here calculation for Hard CPU column, but the idea is same and basically you can the same pattern for it ( Like Used CPU column) .

df['Used CPU'] = np.where(df['Used CPU'].str.contains('m'),
                          pd.to_numeric(df['Used CPU'].map(lambda x:str(x)[:-1])) /1000,
                          df['Used CPU'])

df['Used Memory'] = np.where(df['Used Memory'].str.contains('Mi'),
                          pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * 0.00104858,
                          df['Used Memory'])

df['Hard Memory'] = np.where(df['Hard Memory'].str.contains('Gi'),
                          pd.to_numeric(df['Hard Memory'].map(lambda x:str(x)[:-2])) *(use math conversion here),
                          df['Hard Memory'])

Now, for the second column, there are Gi values too, so you can repeat the same like this

df['Used Memory'] = np.where(df['Used Memory'].str.contains('Gi'),
                          pd.to_numeric(df['Used Memory'].map(lambda x:str(x)[:-2])) * (do math conversion here),
                          df['Used Memory'])

As each items in a column needs different math conversions, if such string exists. Simple possible solution I can think is this. Sorry about that

Upvotes: 1

Related Questions