DevKim

[ Webtooniverse ] 2์ฐจ DB ๊ตฌ์ถ•- ์นด์นด์˜ค ์›นํˆฐ ํฌ๋กค๋ง ๋ณธ๋ฌธ

Spring Project/Webtooniverse

[ Webtooniverse ] 2์ฐจ DB ๊ตฌ์ถ•- ์นด์นด์˜ค ์›นํˆฐ ํฌ๋กค๋ง

on_doing 2021. 8. 5. 11:56
728x90

๐Ÿƒ Webtooniverse ์˜ ํ•ต์‹ฌ์ธ ์›นํˆฐ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด๋ณด์ž ๐Ÿƒ


 

๋“œ๋””์–ด ๊ธฐ๋‹ค๋ฆฌ๊ณ  ๊ธฐ๋‹ค๋ฆฌ๋˜ ์นด์นด์˜ค ์›นํˆฐ์ด ์ถœ์‹œ๋๋‹ค!

๊ธฐ์กด์˜ ๋‹ค์Œ์›นํˆฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋ ค๊ณ ํ–ˆ์œผ๋‚˜, ์นด์นด์˜ค ์›นํˆฐ๊ณผ ํ•ฉ์ณ์ง„๋‹ค๋Š” ๊ธฐ์‚ฌ๋ฅผ ๋ณด๊ณ  ์ถœ์‹œ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆฌ๊ธฐ๋กœ ํ–ˆ๋‹ค.

์‚ฌ์ดํŠธ๋ฅผ ๋ณด์ž๋งˆ์ž ๋“  ์ƒ๊ฐ์€, ์•„..ํฌ๋กค๋ง ํ•˜๊ธฐ ์‰ฝ์ง€ ์•Š๊ฒ ๊ตฌ๋‚˜์˜€๋‹ค.


[ ์ˆ˜์ง‘ํ•  ๋ฐ์ดํ„ฐ ]

ํฌ๋กค๋งํ•ด์•ผ ํ•  ์›นํˆฐ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

1. ์›นํˆฐ ์›์ž‘ - ์š”์ผ๋ณ„ ์ž‘ํ’ˆ๋“ค์˜ ์ •๋ณด๋“ค

2. ์†Œ์„ค ์›์ž‘ - ์š”์ผ๋ณ„ ์ž‘ํ’ˆ๋“ค์˜ ์ •๋ณด๋“ค

3. ์™„๊ฒฐ ์›นํˆฐ - ์›นํˆฐ ์›์ž‘ 50๊ฐœ + ์†Œ์„ค ์›์ž‘ 50๊ฐœ


[ ์‚ฌ์ดํŠธ ๋ถ„์„ ์ด์Šˆ ]

"๋„ค์ด๋ฒ„ ์›นํˆฐ ์‚ฌ์ดํŠธ์™€ ๋‹ฌ๋ฆฌ ์นด์นด์˜ค ์›นํˆฐ์€ ์‚ฌ์ดํŠธ ์ „์ฒด๋ฅผ selenium์œผ๋กœ ์ž๋™ํ™”ํ•˜๊ธฐ์—” ์–ด๋ ค์›€์ด ์žˆ์—ˆ๋‹ค."

 

1. ์‚ฌ์ดํŠธ๋งŒ ๋ด๋„ ์•Œ๊ฒ ์ง€๋งŒ, ์ค‘๊ฐ„์ค‘๊ฐ„ ์˜์ƒ ์ธ๋„ค์ผ์ด ์žˆ๊ณ  ์˜์ƒ ์ธ๋„ค์ผ๊ณผ ๊ทธ๋ ‡์ง€ ์•Š์€ ์ธ๋„ค์ผ์˜ ํƒœ๊ทธ๊ฐ€ ๋‹ค๋ฅด๋‹ค.

2. ์„ฑ์ธ ์›นํˆฐ์€ ๋น„๋กœ๊ทธ์ธ ์ƒํƒœ์—์„œ click์ด ๋˜์ง€ ์•Š๋Š”๋‹ค.

3. ๋Œ€๋‹ค์ˆ˜์˜ ์‚ฌ์ดํŠธ์˜ ๊ฒฝ์šฐ nth-child ๊ฐ’์ด 1,2,3,4.. ์ด๋ ‡๊ฒŒ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜์ง€๋งŒ,

์นด์นด์˜ค ์›นํˆฐ์˜ ๊ฒฝ์šฐ ๋’ค์ฃฝ๋ฐ•์ฃฝ์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜๋‹ค.

4. ์›นํˆฐ ์›์ž‘ ํƒญ์˜ ๊ฒฝ์šฐ, ์˜์ƒ ์ธ๋„ค์ผ์˜ ์œ„์น˜๊ฐ€ ์ผ์ •ํ–ˆ์ง€๋งŒ ์†Œ์„ค์›์ž‘ ํƒญ์˜ ๊ฒฝ์šฐ, ์ธ๋„ค์ผ์˜ ์œ„์น˜๊ฐ€ ๋ถˆ๊ทœ์น™์ ์ด๋‹ค.

5. ์ž‘ํ’ˆ์˜ ์ •๋ณด๋“ค์„ ๊ฐ€์ ธ์™€์•ผํ•˜๋Š”๋ฐ ์ž‘ํ’ˆ ํด๋ฆญ์‹œ, ์˜์ƒ์ด ์žฌ์ƒ๋˜๋Š” ์›นํˆฐ๋“ค์ด ์žˆ๋‹ค.

 

์†Œ์„ค์›์ž‘

 


[ ์ ‘๊ทผ ๋ฐฉ๋ฒ• ]

์›นํˆฐ ์ •๋ณด์— ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ๋„ ์ฒ˜์Œ์—” ์–ด๋ ค์›€์ด ์žˆ์—ˆ์ง€๋งŒ,

์ œ๋ชฉ๊ณผ ์ž‘๊ฐ€, ์žฅ๋ฅด ์ •๋ณด๋Š” head ํƒœ๊ทธ์—์„œ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‹คํ–‰์ด๋„ ๋‚ด๋ถ€ ํƒœ๊ทธ๋Š” ๋™์ผํ–ˆ๋‹ค.

 

1. ์„ฑ์ธ ์›นํˆฐ์€ ๊ฑด๋„ˆ๋›ด๋‹ค.

   1-1. ์–ด์ฐจํ”ผ ๋งŒ๋“ค๊ณ ์žํ•˜๋Š” ์‚ฌ์ดํŠธ์—์„œ๋„ ์นด์นด์˜ค ์›นํˆฐ์ด ๋ง‰์•„๋‘” ์ž‘ํ’ˆ ๋‚ด์šฉ์„ ๋ณด์—ฌ์ค„ ์ˆœ ์—†๋‹ค.

   1-2. ์„ฑ์ธ ์›นํˆฐ์€ '์นด์นด์˜ค ํŽ˜์ด์ง€์—์„œ ๋กœ๊ทธ์ธ ํ›„ ์‚ฌ์šฉํ•ด์ฃผ์„ธ์š”'๋ผ๋Š” ๋ฌธ๊ตฌ๋กœ ๋Œ€์ฒดํ•œ๋‹ค.

 

2. ์›€์ง์ด๋Š” ์ธ๋„ค์ผ๊ณผ ์›€์ง์ด์ง€ ์•Š๋Š” ์ด๋ฏธ์ง€ ์ธ๋„ค์ผ์„ ๊ฐ๊ฐ์˜ ๋ฉ”์†Œ๋“œ๋กœ ๋งŒ๋“ค์ž

 

3. nth-child ์ˆœ์„œ๋Š” ์–ด์ฉ” ์ˆ˜ ์—†์ด ์š”์ผ๋ณ„๋กœ ํ™•์ธํ•ด์ฃผ์–ด์•ผํ•œ๋‹ค.

 

4. ์ƒ์„ธ ํŽ˜์ด์ง€๋กœ ํด๋ฆญ์‹œ, ์˜์ƒ์ด ๋ฐ”๋กœ ์žฌ์ƒ๋˜๋Š” ๊ฒฝ์šฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ time.sleep์„ 30์ดˆ๋กœ ๊ธธ๊ฒŒ ์žก์•„์ค€๋‹ค.


[ ์ฝ”๋“œ ]

1. ์˜์ƒ ์ธ๋„ค์ผ์˜ ์›นํˆฐ

- m : ์š”์ผ์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ˆซ์ž (์›”=1,ํ™”=2..)

- day : ์š”์ผ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฌธ์ž (์›”,ํ™”,์ˆ˜.....)

 

์›นํˆฐ๊ณผ ์žฅ๋ฅด์˜ FK๋„ ๊ฐ™์ด ๋„ฃ์–ด์ค˜์•ผํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—,

์›นํˆฐ PK๊ฐ’์„ DB์™€ ์ง€์†์ ์œผ๋กœ ๋Œ€์กฐํ•˜๋ฉด์„œ ํ™•์ธํ•ด์ฃผ๋Š” ์ž‘์—…์ด ์ถ”๊ฐ€์ ์œผ๋กœ ํ•„์š”ํ•˜๋‹ค.

def week_toon_move(m, day):
    global index
    # =============================================์›€์ง์ด๋Š” ์›นํˆฐ ํด๋ฆญ=============================================#
    # ์ธ๋„ค์ผ
    toon_img = driver.find_element_by_css_selector(
        f'#root > main > div > div.page.color_bg_black__2MXm7.activePage > div.swiper-container.swiper-container-initialized.swiper-container-horizontal.swiper-container-pointer-events > div > div.swiper-slide.swiper-slide-active > div > div > div > div > div > div:nth-child({m}) > div.Masonry_masonry__38RyV > div:nth-child(1) > div > div > div > div > div > a > video'
    ).get_attribute('poster')

    driver.find_element_by_css_selector(
        f'#root > main > div > div.page.color_bg_black__2MXm7.activePage > div.swiper-container.swiper-container-initialized.swiper-container-horizontal.swiper-container-pointer-events > div > div.swiper-slide.swiper-slide-active > div > div > div > div > div > div:nth-child({m}) > div.Masonry_masonry__38RyV > div:nth-child(1) > div > div > div > div > div > a > video'
    ).click()

    time.sleep(23)

    # ์ œ๋ชฉ
    toon_title=driver.find_element_by_css_selector('head > meta:nth-child(33)').get_attribute('content').strip()

    # ์ž‘๊ฐ€
    toon_author=driver.find_element_by_css_selector('head > meta:nth-child(27)').get_attribute('content')
    temp_List=toon_author.split(',')
    temp_List=temp_List[1:-1]
    toon_author=' / '.join(temp_List).strip()

    # ์„ค๋ช…
    toon_content=driver.find_element_by_css_selector('head > meta:nth-child(26)').get_attribute('content')

    # ์š”์ผ
    toon_weekday = day

    # ์‹ค์ œ ์‚ฌ์ดํŠธ
    real_url = driver.current_url

    # ์‹ค์ œ ํ”Œ๋žซํผ
    toon_platform = '์นด์นด์˜ค'

    # ์™„๊ฒฐ ์—ฌ๋ถ€
    finished = False

    #์žฅ๋ฅด
    genre = driver.find_element_by_css_selector(
        '#root > main > div > div.page.color_bg_black__2MXm7.activePage > div > div.Content_homeWrapper__2CMgX.common_positionRelative__2kMrZ > div.Content_metaWrapper__3srNJ > div.Content_contentMainWrapper__3AlhK.Content_current__2yPD8 > div.spacing_pb_28__VqvVT.spacing_pt_96__184F4 > div.common_positionRelative__2kMrZ.spacing_mx_a__2yxXH.spacing_my_0__1f7t6.MaxWidth_maxWidth__2Qvbl > div.Meta_meta__1HmBY.spacing_mx_20__17RDr.spacing_mt_16__29c-N > div > div > p.Text_default__HZL19.textVariant_s13_regular_white__1-AxN.spacing_ml_3__2NL9t.opacity_opacity85__gH87s').text

    data.append(
        [toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform, finished,genre])

    print([toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform, finished,genre])

    sql = "INSERT INTO webtoon (toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, toon_age, toon_platform, finished,toon_avg_point,review_count,total_point_count) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,%s,%s,%s)"
    val = (toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform, finished,0,0,0)
    cur.execute(sql, val)
    conn.commit()


    genre_List=dic[genre]

    for genre_id in genre_List:
        sql = "INSERT INTO webtoon_genre(toon_id,genre_id) VALUES (%s, %s)"
        val = (index,genre_id)
        cur.execute(sql, val)
        conn.commit()

    index+=1

    driver.back()
    time.sleep(2)

 

2. ์ด๋ฏธ์ง€ ์ธ๋„ค์ผ ์›นํˆฐ

- m: ์š”์ผ ๋ณ„ ์ˆซ์ž

- day: ์š”์ผ ์ด๋ฆ„

- start : nth-child(start)

- end :nth-child(end)

 

์›นํˆฐ ์›์ž‘์˜ ๊ฒฝ์šฐ nth-child ์ˆซ์ž๊ฐ€ 2,3,5,7,8.. ์ด๋Ÿฐ์‹์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

๋‚˜์ค‘์—๋Š” ๊ทœ์น™์„ ์ฐพ์•„์„œ 2,3,5 ๋”ฐ๋กœ 7~ ๋๊นŒ์ง€ ์ด๋ ‡๊ฒŒ ๋ถ„๋ฆฌํ•ด์„œ ๋‘๊ฐœ์˜ ๋ฉ”์†Œ๋“œ๋กœ ๋นผ๋†“๊ณ  ์ฝ”๋“œ๋ฅผ ๋Œ๋ ธ๋‹ค.

์ค‘๊ฐ„์— ์„ฑ์ธ ๋งŒํ™”๊ฐ€ ๋“ค์–ด๊ฐ„ ๊ฒฝ์šฐ, ๊ฑด๋„ˆ๋›ฐ์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด ๊ฒฝ์šฐ ๋˜ํ•œ ๊ณ ๋ คํ•ด์ฃผ์–ด์•ผํ•œ๋‹ค.

def week_toon_range(m, day, start,end):
    global index

    for i in range(start, end + 1):

        # ์ธ๋„ค์ผ
        toon_img = driver.find_element_by_css_selector(
            f'#root > main > div > div.page.color_bg_black__2MXm7.activePage > div.swiper-container.swiper-container-initialized.swiper-container-horizontal.swiper-container-pointer-events > div > div.swiper-slide.swiper-slide-active > div > div > div > div > div > div:nth-child({m}) > div.Masonry_masonry__38RyV > div:nth-child({i}) > div > div > div > div > a > picture > img'
        ).get_attribute('src')


        # ์›นํˆฐ ํด๋ฆญ
        driver.find_element_by_css_selector(
            f'#root > main > div > div.page.color_bg_black__2MXm7.activePage > div.swiper-container.swiper-container-initialized.swiper-container-horizontal.swiper-container-pointer-events > div > div.swiper-slide.swiper-slide-active > div > div > div > div > div > div:nth-child({m}) > div.Masonry_masonry__38RyV > div:nth-child({i}) > div > div > div > div > a'
        ).click()

        time.sleep(30)

        # ์„ฑ์ธ์›นํˆฐ ํŒจ์Šค
        # try:
        #     driver.find_element_by_css_selector(
        #         'body > div:nth-child(21) > div > div > div >div.common_positionAbsolute__3eY3C.common_widthFull__1hw6a.spacing_px_20__1gg7C.Alert_buttonsWrap__2fPaV >button').click()
        #     time.sleep(3)

        # ์›นํˆฐ ์ •๋ณด
        # ์ œ๋ชฉ
        toon_title=driver.find_element_by_css_selector('head > meta:nth-child(32)').get_attribute('content').strip()

        # ์ž‘๊ฐ€
        toon_author = driver.find_element_by_css_selector('head > meta:nth-child(27)').get_attribute('content')
        temp_List = toon_author.split(',')
        temp_List = temp_List[1:-1]
        toon_author = ' / '.join(temp_List).strip()

        # ์„ค๋ช…
        toon_content = driver.find_element_by_css_selector('head > meta:nth-child(26)').get_attribute('content')

        # ์š”์ผ
        toon_weekday = day

        # ์—ฐ๋ น
        # toon_age

        # ์‹ค์ œ ์‚ฌ์ดํŠธ
        real_url = driver.current_url

        # ์‹ค์ œ ํ”Œ๋žซํผ
        toon_platform = '์นด์นด์˜ค'

        # ์™„๊ฒฐ ์—ฌ๋ถ€
        finished = False

        # ์žฅ๋ฅด
        genre = driver.find_element_by_css_selector(
            '#root > main > div > div.page.color_bg_black__2MXm7.activePage > div > div.Content_homeWrapper__2CMgX.common_positionRelative__2kMrZ > div.Content_metaWrapper__3srNJ > div.Content_contentMainWrapper__3AlhK.Content_current__2yPD8 > div.spacing_pb_28__VqvVT.spacing_pt_96__184F4 > div.common_positionRelative__2kMrZ.spacing_mx_a__2yxXH.spacing_my_0__1f7t6.MaxWidth_maxWidth__2Qvbl > div.Meta_meta__1HmBY.spacing_mx_20__17RDr.spacing_mt_16__29c-N > div > div > p.Text_default__HZL19.textVariant_s13_regular_white__1-AxN.spacing_ml_3__2NL9t.opacity_opacity85__gH87s').text


        data.append([toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform,
                     finished,genre])
        print([toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform,
               finished,genre])

        sql = "INSERT INTO webtoon (toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, toon_age, toon_platform, finished,toon_avg_point,review_count,total_point_count) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s,%s,%s,%s)"
        val = (toon_title, toon_author, toon_content, toon_img, toon_weekday, real_url, None, toon_platform, finished,0,0,0)
        cur.execute(sql, val)
        conn.commit()

        genre_List = dic[genre]

        for genre_id in genre_List:
            sql = "INSERT INTO webtoon_genre(toon_id,genre_id) VALUES (%s, %s)"
            val = (index,genre_id)
            cur.execute(sql, val)
            conn.commit()

        index+=1
        driver.back()
        time.sleep(3)

 

[ ๊ฒฐ๊ณผ ๊ฐ’ ]

์ค‘๊ฐ„ ๊ณผ์ •์ด ํ—˜๋‚œํ–ˆ์ง€๋งŒ..

db์— ๊ฒฐ๊ณผ ๊ฐ’์ด ์ •์ƒ์ ์œผ๋กœ ์ €์žฅ๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์žฅ๋ฅด์™€ ์›นํˆฐ์˜ ์—ฐ๊ฒฐ ํ…Œ์ด๋ธ”์—๋„ ์ž˜ ๋“ค์–ด๊ฐ€์ง„๋‹ค.

728x90
Comments