Sitz Blogz
Sitz Blogz

Reputation: 1061

Find mean of value row wise in a list

I am trying find mean of the values in two columns.

Input:

tweetcricscore  34 #afgvssco   51
tweetcricscore  23 #afgvszim   46
tweetcricscore  24 #banvsire   12
tweetcricscore  456 #banvsned  46
tweetcricscore  653 #canvsnk   1
tweetcricscore  789 #cricket   178
tweetcricscore  625 #engvswi   46
tweetcricscore  86 #hkvssco    23
tweetcricscore  3 #indvsban    1
tweetcricscore  87 #sausvsvic  8
tweetcricscore  98 #wt20       56

I am trying to find the mean value of col[1] and col[3] horizontally. I.e.

34 + 51 / 2 = 42.5

Code:

data = np.genfromtxt('keyword.csv', delimiter=',', comments=None)

d0=data[:,1]
d1=data[;,3]

and write the output d2 in col[4] in same input file.

I can find many samples for the vertical calculation of mean but not able to find horizontal. The array is NOT to be sorted as both the values are related to each other with respect to col[0] and col[2]. Any suggestion is appreciated.

Thanks in advance.

Upvotes: 0

Views: 1509

Answers (2)

Grr
Grr

Reputation: 16119

In numpy you can accomplish this with np.mean (see docs here. You can run this on either axis; 0 being the vertical and 1 being the horizontal The trick is that you need to remove your columns without text first. After you run your line data = np.genfromtxt('keyword.csv', delimiter=',', comments=None) next delete the columns with text. In this case column 0 and 2, so you can just delete every other row starting with 0 using data2 = np.delete(data, np.s_[::2], 1). Once you have this you can then use d3 = np.mean(data2, axis=1) and will get an array of the desired means. To get this as the forth column in the original data then use data = np.insert(data, data.shape[1], d3, axis=1)

All together:

data = np.genfromtxt('keyword.csv', delimiter=',', comments=None)
data2 = np.delete(data, np.s_[::2], 1)
d3 = np.mean(data2, axis=1)
data = np.insert(data, data.shape[1], d3, axis=1)

My biggest caveat with this is the data2 line, it will only work with a table structured the way you displayed. If you have more columns or columns that aren't ordered as text, float, text, float, ... you will have to rewrite it to correctly remove the columns with text. If necessary you could write up a script to search for columns with NaN values once loaded into a numpy array. eg np.all(np.isnan(data), axis=0) will return a 1-d array with True, False values for whether all values in a column are nan or not. You could then use this to remove columns at index with values of true.

Functionally:

indices = np.all(np.isnan(data), axis=0)
indices = np.where(indices == True)
data2 = np.delete(data, np.s_[indices[0]],1)

Pandas is great because it will automatically ignore columns with text for the mean value. It makes this calc much simpler:

import pandas as pd
df = pd.read_csv('keywords.csv', header=None)
df[df.shape[1]] = df.mean(axis=1)

This creates a new column at the end of the dataframe containing the mean value of integer/float values in that row.

Upvotes: 1

flyingmeatball
flyingmeatball

Reputation: 7997

You can easily do this in Pandas:

import pandas as pd

df = pd.read_csv('keyword.csv', header = None)

df.columns = ['col1','col2','col3','col4']

df['avgCol'] = (df['col2'] + df['col4'])/2

Upvotes: 3

Related Questions