Trya Sovi Kartikasari
Trya Sovi Kartikasari

Reputation: 91

Combine 2 csv file using python with the specified amount

I want to combine 2 file CSV data, but not all data. e.g: a.csv + b.csv, where b.csv have 20 data. But I want to take only 10 data from that, and then take 11-20 data. Or the first 10 and the second 10

Then insert the first 10 data into a.csv, and the second 10 data into a.csv too My Question is how can I take only specific total data?

Here is my code:

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

I expect the result return that I want, but the actual result is combining all data.

Upvotes: 0

Views: 840

Answers (3)

Rolf of Saxony
Rolf of Saxony

Reputation: 22443

As mentioned in my comment, you can use nrows

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more options

Upvotes: 0

wwii
wwii

Reputation: 23753

Without using Pandas. Read the lines of each file; add ten lines from one file's data to the other; write the result to another file.

with open('a.csv') as f:
    data = f.readlines()
with open('b.csv') as f:
    bdata = f.readlines()

data.extend(bdata[:10])

with open('output.csv', 'w'):
    f.writelines(data)

If the files are HUGE and you don't want to read the entire contents into memory, use some itertools functions.

import itertools
with open('a.csv') as a, open('b.csv') as b, open('output.csv', 'w') as out:
    first_ten = itertools.islice(b, 10)
    for line in itertools.chain(a, first_ten):
        out.write(line)

Assumes both files have the same number of columns.

Upvotes: 0

ipol
ipol

Reputation: 1

import pandas as pd
import numpy as np
# Creating two dataframes with data that overlap, so we don't want all of the 'b' data.
# We want to strip off '3,4,5' as they exist in 'a' as well
# ----------Creating the data frames----------
a = [1,2,3,4,5]
b = [3,4,5,6,7,8,9,10]

dfa = pd.DataFrame(a)
dfa.to_csv('one.csv', index=False)

dfb = pd.DataFrame(b)
dfb.to_csv('two.csv', index = False)
# ---------------------------------------------

# --------Reading through the dataframes-------
one = pd.read_csv('one.csv')
two = pd.read_csv('two.csv')
# ---------------------------------------------

# Stripping off the first 3 data of 'two' the list
output = one.append(two[3:])
output.to_csv("output.csv", sep=',', index=False)
# ---------------------------------------------

I hope this answers your question. The important part for you is output = one.append(two[3:]). There are more sophisticated ways to do the same thing but this is the simplest.

Upvotes: 0

Related Questions