Garberchov
Garberchov

Reputation: 67

Python Pandas DataFrame: ValueError: 2 columns passed, passed data had 3 columns

I am trying to plot a line of best fit on some LIDAR data using plotly and a pandas data frame, however when I try to create the data frame I am getting an error ValueError: 2 columns passed, passed data had 3 columns. I am just trying to read the lidar data from the .csv file and plot it, but I am getting the error for some reason. The only thing I can think of is that it is trying to read the [] as data points, but I don't see any reason it would be doing so. If anyone could help me decipher this, the help would be great.

Python Code:

import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, radians
import multiprocessing as mp
import pandas as pd
import plotly.express as px
import csv

new_master=[]
def grab_plot():
    with open('lidar03.csv', 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            temp_list = []
            new_x = float(row[0])
            new_y = float(row[1])
            temp_list.append([new_x, new_y])
            new_master.append(row)
            if len(new_master) > 1000:
                df = pd.DataFrame(new_master, columns=['x', 'y'])
                fig = px.scatter(df, x='x', y="y", trendline="lowess")
                fig.show()
            else:
                print("err")


grab_plot()

Lidar03.csv(just some example data, real file is 280k lines):

-241.72250217077044,-399.5738128860572
-227.90134287289055,-396.9836777711814
-215.29533284661807,-396.0094470520418
-206.42379816517118,-402.9538162755934
-202.48907022573056,-417.7633767327136
-194.58213975978188,-473.043381611565
-139.37896911133979,-1391.7884413478437
-105.58002562367821,-1395.01034339868
-9.548104225177978,-1400.4674518551674
22.257610379315068,-1407.0739715026366
53.92438894226407,-1411.7204832321459
86.536304790659,-1414.8560767629965
119.66441166265868,-1416.7051531922336
151.09708809931834,-1418.9780371689715
185.17026611362976,-1424.51543429596
219.19089203051948,-1429.2905068077885
253.3941639959759,-1430.2264323011166
286.9286362218502,-1430.2529567233444

Upvotes: 1

Views: 1027

Answers (2)

Artyom Akselrod
Artyom Akselrod

Reputation: 976

for loops are highly inefficient, it is recommended to use built-in pandas functions.

I believe that your code can become much simplier

# your imports
import numpy as np
import matplotlib.pyplot as plt
from math import sin, cos, radians
import multiprocessing as mp
import pandas as pd
import plotly.express as px
import csv

def grab_plot():
    df = pd.read_csv('lidar03.csv')
    if df.shape[0] > 1000: 
        fig = px.scatter(df, x='x', y="y", trendline="lowess")
        fig.show()
    else:
        print("err")

grab_plot()

Moreover, cycle

for row in reader:
   ...
   if len(new_master) > 1000:
       ...
       fig.show()

shows one picture for each row after first 1000 rows, I think that it is not what you actually wanted, my code is shows picture just once.

Upvotes: 1

Garberchov
Garberchov

Reputation: 67

The problem is that I was appending row instead of doing new_master.append([new_x, new_y)]. That fixed the problem. Full Code:

new_master=[]
def grab_plot():
    with open('lidar03.csv', 'r') as f:
        reader = csv.reader(f)
        for row in reader:
            temp_list = []
            new_x = float(row[0])
            new_y = float(row[1])
            new_master.append([new_x, new_y])
            if len(new_master) > 1000:
                df = pd.DataFrame(new_master, columns=['x', 'y'])
                fig = px.scatter(df, x='x', y="y", trendline="lowess")
                fig.show()
            elif len(new_master) < 1000:
                pass

grab_plot()

Upvotes: 0

Related Questions