본문 바로가기

Back/Python

[Python] CSV 파일 읽기, 이미지 정리.

딥러닝을 위한 이미지 자료를 정리하던중 다음과 같은 상황에 마주하게 되었습니다.

6192개의 이미지와 이미지의 정보가 csv파일로 정리되어 있습니다. csv 파일을 열어보면

 

위와 같이 파일 이름, 그리고 그 파일이 어떤 상태를 나타내는지 체크되어 있습니다. 

학습을 위해선 B~E열까지의 정보를 불러와 valid 값으로 넣어 주면 되지만 이전부터 사용하던 datagenerator와 모델들이 전부  datagen.flow_from_directory를 사용하였기에 이미지들을 종류별로 정리하고자 합니다.

import pandas as pd
import os
import shutil

 

정리를 위해 3가지 라이브러리를 사용하였습니다. 

  • pandas : csv파일을 열기위해 사용
  • os : 새 경로를 만들기 위해 사용
  • shutil : 파일 이동을 위해 사용
img_list=pd.read_csv('C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/train.csv')

img_list.columns

img_list.keys()
img_list.values
img_list.values[0][0] #파일 이름
img_list.values[0][1] # 1~4 파일 종류 0 건강, 1복합, 2rust, 3 scab

len(img_list.index.values) #총 6192개

csv 파일을 불러와 확인하면 다음과 같은 결과를 얻을수 있습니다.

>>>img_list=pd.read_csv('C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/train.csv')

>>>img_list.columns
Index(['image_id', 'healthy', 'multiple_diseases', 'rust', 'scab'], dtype='object')

>>>img_list.keys()
Index(['image_id', 'healthy', 'multiple_diseases', 'rust', 'scab'], dtype='object')

>>>img_list.values
array([['Train_0', 0, 0, 0, 1],
       ['Train_1', 0, 1, 0, 0],
       ['Train_2', 1, 0, 0, 0],
       ...,
       ['train_add_r_273', 0, 0, 1, 0],
       ['train_add_r_274', 0, 0, 1, 0],
       ['train_add_r_275', 0, 0, 1, 0]], dtype=object)

>>>img_list.values[0][0] #파일 이름
'Train_0'

>>>img_list.values[0][1] # 1~4 파일 종류  1 건강, 2복합, 3rust, 4scab
0

>>>len(img_list.index.values) #총 6192개
6192

values를 읽어 파일명에 해당하는 폴더를 생성하고 그 폴더에 맞게 나누면 될것 같습니다.

 

path = 'C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/'
img_type = '.jpg'

for a in range(4):
    try:
        os.mkdir(path + img_list.keys()[a+1])
    except:
        pass

경로와 이미지 형식을 정하고 반복문을 이용해 폴더를 생성합니다. index를 포함해서 불러왔기에 key의 첫번째 행에 각 이름이 들어가 있습니다.

이미 폴더가 있을 경우일부만 존재하더라도 진행이 try ~ except 문에 넣어 모든 폴더가 만들어질 수 있게 합니다.

for a in range(len(img_list.index.values)):
    for b in range(1,5):
        if img_list.values[a][b] == 1 :
            filename = img_list.values[a][0]+img_type
            src = path + 'images/'
            dir = path + img_list.keys()[b]+'/'
            try:
                shutil.move(src + filename, dir + filename)
                print(src + filename+","+dir + filename)
            except:
                pass
    print("##copy end##")

 

values의 길이, 즉 안에 있는 파일의 정보수만큼 반복문을 실행하겠습니다.

shutil.move('옮길 파일 경로','옮겨진 파일경로')를 통해 파일을 이동시킬 수 있습니다.

매실행마다 파일이 1~4번 까지중 어떤 파일인지 읽고 그에 맞는 폴더로 이동합니다. 이미 이동하여 파일이 존재하지 않을 경우 에러가 발생하기에 try~except를 이용하여 넘어갈 수 있게 해주었고 매파일이 이동할때마다 이전경로와 새로운 경로를 출력하도록 하였습니다.

C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_583.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_583.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_584.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_584.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_585.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_585.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_586.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_586.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_587.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_587.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_588.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_588.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_589.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_589.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_590.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_590.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_591.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_591.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_592.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_592.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_593.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_593.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_594.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_594.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_595.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_595.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_596.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_596.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_597.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_597.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_598.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_598.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_599.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_599.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_600.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_600.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_601.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_601.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_602.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_602.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_603.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_603.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_604.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_604.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_605.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_605.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_606.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_606.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_607.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_607.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_608.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_608.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_609.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_609.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_610.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_610.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_611.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_611.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_612.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_612.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_613.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_613.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_614.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_614.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_615.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_615.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_616.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_616.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_617.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_617.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_618.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_618.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_619.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_619.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_620.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_620.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_621.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_621.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_622.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_622.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_623.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_623.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_624.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_624.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_625.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_625.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_626.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_626.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_627.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_627.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_628.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_628.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_629.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_629.jpg
C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/images/train_add_s_630.jpg,C:/Users/Admin/Desktop/plant_Pathology_Dataset and Apple Leaf_data/scab/train_add_s_630.jpg

 

중간쉼표를 기준으로 이전경로 이동후 경로를 나타내 줍니다.

 

이미지들이 잘 정리된 것을 확인 할 수 있습니다.