Reputation: 91
I want to combine 2 file CSV data, but not all data. e.g: a.csv + b.csv, where b.csv have 20 data. But I want to take only 10 data from that, and then take 11-20 data. Or the first 10 and the second 10
Then insert the first 10 data into a.csv, and the second 10 data into a.csv too My Question is how can I take only specific total data?
Here is my code:
import pandas as pd
df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)
output=df1.append(df2)
output.to_csv("output.csv", sep=',')
I expect the result return that I want, but the actual result is combining all data.
Upvotes: 0
Views: 840
Reputation: 22443
As mentioned in my comment, you can use nrows
import pandas as pd
df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)
output=df1.append(df2)
output.to_csv("output.csv", sep=',')
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more options
Upvotes: 0
Reputation: 23753
Without using Pandas. Read the lines of each file; add ten lines from one file's data to the other; write the result to another file.
with open('a.csv') as f:
data = f.readlines()
with open('b.csv') as f:
bdata = f.readlines()
data.extend(bdata[:10])
with open('output.csv', 'w'):
f.writelines(data)
If the files are HUGE and you don't want to read the entire contents into memory, use some itertools functions.
import itertools
with open('a.csv') as a, open('b.csv') as b, open('output.csv', 'w') as out:
first_ten = itertools.islice(b, 10)
for line in itertools.chain(a, first_ten):
out.write(line)
Assumes both files have the same number of columns.
Upvotes: 0
Reputation: 1
import pandas as pd
import numpy as np
# Creating two dataframes with data that overlap, so we don't want all of the 'b' data.
# We want to strip off '3,4,5' as they exist in 'a' as well
# ----------Creating the data frames----------
a = [1,2,3,4,5]
b = [3,4,5,6,7,8,9,10]
dfa = pd.DataFrame(a)
dfa.to_csv('one.csv', index=False)
dfb = pd.DataFrame(b)
dfb.to_csv('two.csv', index = False)
# ---------------------------------------------
# --------Reading through the dataframes-------
one = pd.read_csv('one.csv')
two = pd.read_csv('two.csv')
# ---------------------------------------------
# Stripping off the first 3 data of 'two' the list
output = one.append(two[3:])
output.to_csv("output.csv", sep=',', index=False)
# ---------------------------------------------
I hope this answers your question. The important part for you is output = one.append(two[3:])
. There are more sophisticated ways to do the same thing but this is the simplest.
Upvotes: 0