Cmagelssen
Cmagelssen

Reputation: 660

Why do I get Length of values (1) does not match length of index (3) when using random.sample()?

My Python code returns the following error message:

  File "/Users/christianmagelssen/Desktop/Koding/analyse/moduler/resultater.py", line 64, in allokereGrupper
    group1['GRUPPE'] = velger
ValueError: Length of values (1) does not match length of index (3)

I have tried many different things to solve this issue:

  1. I have tried to change the k to 1, 2 but that doesn't help.
  2. I have tried to different pandas code to drop duplicates, including .unique and the drop duplicates that I am using now.

I know that my code worked 3 months ago but on another dataset. Can someone help me so I understand what I am doing wrong here?

Here is all my code

results.py

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import random

class Resultat:

    def lastInnOgRydd(path, LagreCsv = False):
        df = pd.read_csv(path, skiprows=2, decimal=".")
        filt = df['FINISH'] == 'DNF'
        dnf = df[filt]
        dnf = dnf.replace('DNF', 1)
        if LagreCsv == True:
            dnf.to_csv('DNF.csv')
        df.replace('DNF', np.NaN, inplace=True)
        df.replace('GARBAGE GARBAGE', np.NaN, inplace=True) #Denne finnes det nok en bedre løsning på
        df.dropna(subset=['FINISH'], inplace=True)
        df.dropna(subset=['NAME'], inplace=True)
        return df

    def endreDataType(df):
        df["FINISH"] = df["FINISH"].str.replace(',', '.').astype(float)
        df["INTER 1"] = df["INTER 1"].str.replace(',', '.').astype(float)
        df["SECTION IM4-FINISH"] = df["SECTION IM4-FINISH"].str.replace(',', '.').astype(float)
        df["COMMENT"] = df['COMMENT'].astype(int)
        df["COMMENT"] = df['COMMENT'].astype(str)
        df["COMMENT"] = df['COMMENT'].str.replace('11', 'COURSE 1')
        df["COMMENT"] = df['COMMENT'].str.replace('22', 'COURSE 2')
        df["COMMENT"] = df['COMMENT'].str.replace('33', 'COURSE 3')
        df["COMMENT"] = df['COMMENT'].str.replace('55', 'UTKJORING')
        df["COMMENT"] = df['COMMENT'].str.replace('99', 'STRAIGHT-GLIDING')
        pd.to_numeric(df['FINISH'], downcast='float', errors='raise')
        pd.to_numeric(df['INTER 1'], downcast='float', errors='raise')
        pd.to_numeric(df['SECTION IM4-FINISH'], downcast='float', errors='raise')
        return df

    def navnendringCommentTilCourse(df):
        df.rename(columns={'COMMENT': 'COURSE'}, inplace=True)
        return df

    def finnBesteRunder(df):
        grupper = df.groupby(['BIB#', 'COURSE'])
        bestruns = grupper['FINISH'].apply(lambda x: x.nsmallest(2).mean()).reset_index()
        df1 = bestruns.pivot('BIB#', 'COURSE', 'FINISH').reset_index()
        df1['GJENNOMSNITT'] = df1['COURSE 1'].add(df1['COURSE 2']).add(df1['COURSE 3']).div(3)
        #df1['PRESTASJON'] = df1['MEAN'].div(df1['STRAIGHT-GLIDING']) # fjerner denne nå, men må med i den ordentilige analysen
        return df1

    def allokereGrupper(df1):
        df1 = df1.sort_values(by='GJENNOMSNITT', ascending=True)
        mask = np.arange(len(df1)) % 2
        group1 = df1.loc[mask == 0]
        group1 =  group1.drop_duplicates(subset=['BIB#'])
        print(group1)
        group2 = df1.loc[mask == 1]
        group2 =  group2.drop_duplicates(subset=['BIB#'])
        print(group2)
        
        grupper = ['RANDOM', 'BLOCKED']

        for i in group1['BIB#']:
            velger = random.sample(grupper, k=1)
        group1['GRUPPE'] = velger

 

main.py

from moduler import Resultat


path = "http://www.cmagelssen.no/pilot2.csv"

df = Resultat.lastInnOgRydd(path)
df = Resultat.endreDataType(df)
df = Resultat.navnendringCommentTilCourse(df)
df = Resultat.finnBesteRunder(df)
df = Resultat.allokereGrupper(df)



Upvotes: 1

Views: 251

Answers (1)

Alan
Alan

Reputation: 2498

The problem is that velger is a list. It looks like either ['RANDOM'] or ['BLOCKED']. When you try to create the 'GRUPPE' column, you must feed a non-iterable, like a string or int.

If you feed it an iterable, Pandas assumes that your iterable is the same length as your dataframe, and fills every dataframe row with the corresponding value in the iterable. (3rd row gets 3rd list element, for example). But of course your iterable has length one, and the dataframe group1 does not necessarily just have one element. Maybe in your previous dataset that was the case.

It's not entirely clear to me what is your goal from the code, but if your intention is to fill every cell in the 'GRUPPE' column with the same value (either 'RANDOM' or 'BLOCKED', then change:

group1['GRUPPE'] = velger

to

group1['GRUPPE'] = velger[0]

Upvotes: 1

Related Questions