在Python的字符串中查找字符串的多次出现

如何在Python的字符串中找到一个字符串的多次出现? 考虑一下:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>> 

因此ll的第一次出现是预期的1。 我如何找到它的下一次出现?

相同问题对列表有效。 考虑:

>>> x = ['ll', 'ok', 'll']

如何找到所有ll及其索引?

user225312 asked 2019-10-06T14:09:09Z
17个解决方案
95 votes

使用正则表达式,您可以使用str.find查找所有(不重叠)事件:

>>> import re
>>> text = 'Allowed Hello Hollow'
>>> for m in re.finditer('ll', text):
         print('ll found', m.start(), m.end())

ll found 1 3
ll found 10 12
ll found 16 18

另外,如果您不希望使用正则表达式,也可以重复使用str.find获取下一个索引:

>>> text = 'Allowed Hello Hollow'
>>> index = 0
>>> while index < len(text):
        index = text.find('ll', index)
        if index == -1:
            break
        print('ll found at', index)
        index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

这也适用于列表和其他序列。

poke answered 2019-10-06T14:09:38Z
23 votes

我认为您正在寻找的是string.count

"Allowed Hello Hollow".count('ll')
>>> 3

希望这可以帮助
注意:这仅捕获不重叠的事件

inspectorG4dget answered 2019-10-06T14:10:16Z
19 votes

对于列表示例,请使用以下理解:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

对于字符串类似:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

这将列出相邻的“ ll”运行,这可能是您想要的,也可能不是:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
bstpierre answered 2019-10-06T14:11:01Z
12 votes

FWIW,以下是一些非RE替代方案,我认为这些方案比poke的解决方案更为整洁。

第一个使用str.find并检查-1

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

第二个测试使用str.find,并使用iter检查-1的前哨:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

要将这些函数中的任何一个应用于列表,元组或其他可迭代的字符串,可以使用更高级别的函数(该函数将一个函数作为其参数之一),如下所示:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)
intuited answered 2019-10-06T14:11:55Z
3 votes

对于您的列表示例:

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

如果要包含“ ll”的列表中的所有项目,也可以这样做。

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow
chauncey answered 2019-10-06T14:12:30Z
2 votes
>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16
ghostdog74 answered 2019-10-06T14:12:59Z
1 votes

一般而言,这是编程新手,并且可以通过在线教程进行学习。 我也被要求这样做,但只能使用到目前为止所学的方法(基本上是字符串和循环)。 不知道这是否在这里增加了任何价值,我知道这不是您要怎么做,但是我可以将其用于此工作:

needle = input()
haystack = input()
counter = 0
n=-1
for i in range (n+1,len(haystack)+1):
   for j in range(n+1,len(haystack)+1):
      n=-1
      if needle != haystack[i:j]:
         n = n+1
         continue
      if needle == haystack[i:j]:
         counter = counter + 1
print (counter)
Aaron Semeniuk answered 2019-10-06T14:13:26Z
0 votes

此版本的字符串长度应该是线性的,并且应该很好,只要序列不太重复即可(在这种情况下,您可以使用while循环替换递归)。

def find_all(st, substr, start_pos=0, accum=[]):
    ix = st.find(substr, start_pos)
    if ix == -1:
        return accum
    return find_all(st, substr, start_pos=ix + 1, accum=accum + [ix])

bstpierre的列表理解是短序列的一个很好的解决方案,但是看起来具有二次复杂性,并且从未在我使用的长文本上完成。

findall_lc = lambda txt, substr: [n for n in xrange(len(txt))
                                   if txt.find(substr, n) == n]

对于非平凡长度的随机字符串,两个函数给出的结果相同:

import random, string; random.seed(0)
s = ''.join([random.choice(string.ascii_lowercase) for _ in range(100000)])

>>> find_all(s, 'th') == findall_lc(s, 'th')
True
>>> findall_lc(s, 'th')[:4]
[564, 818, 1872, 2470]

但是二次版本要慢300倍

%timeit find_all(s, 'th')
1000 loops, best of 3: 282 µs per loop

%timeit findall_lc(s, 'th')    
10 loops, best of 3: 92.3 ms per loop
beardc answered 2019-10-06T14:14:22Z
0 votes
#!/usr/local/bin python3
#-*- coding: utf-8 -*-

main_string = input()
sub_string = input()

count = counter = 0

for i in range(len(main_string)):
    if main_string[i] == sub_string[0]:
        k = i + 1
        for j in range(1, len(sub_string)):
            if k != len(main_string) and main_string[k] == sub_string[j]:
                count += 1
                k += 1
        if count == (len(sub_string) - 1):
            counter += 1
        count = 0

print(counter) 

该程序将计算所有子字符串的数量,即使它们不使用正则表达式也重叠了。 但这是一个幼稚的实现,在最坏的情况下要获得更好的结果,建议使用后缀树,KMP以及其他字符串匹配的数据结构和算法。

pmsh.93 answered 2019-10-06T14:14:54Z
0 votes

这是我查找多个事件的功能。 与此处的其他解决方案不同,它支持切片的可选开始和结束参数,就像str.index一样:

def all_substring_indexes(string, substring, start=0, end=None):
    result = []
    new_start = start
    while True:
        try:
            index = string.index(substring, new_start, end)
        except ValueError:
            return result
        else:
            result.append(index)
            new_start = index + len(substring)
Elias Zamaria answered 2019-10-06T14:15:24Z
0 votes

一个简单的迭代代码,该代码返回出现子字符串的索引列表。

        def allindices(string, sub):
           l=[]
           i = string.find(sub)
           while i >= 0:
              l.append(i)
              i = string.find(sub, i + 1)
           return l
FReeze FRancis answered 2019-10-06T14:15:51Z
0 votes

您可以拆分以获取相对位置,然后将列表中的连续数字求和并同时添加(字符串长度*出现顺序)以获取所需的字符串索引。

>>> key = 'll'
>>> text = "Allowed Hello Hollow"
>>> x = [len(i) for i in text.split(key)[:-1]]
>>> [sum(x[:i+1]) + i*len(key) for i in range(len(x))]
[1, 10, 16]
>>> 
WaKo answered 2019-10-06T14:16:23Z
0 votes

也许不是Pythonic,但是更不言而喻。 它返回单词在原始字符串中的位置。

def retrieve_occurences(sequence, word, result, base_counter):
     indx = sequence.find(word)
     if indx == -1:
         return result
     result.append(indx + base_counter)
     base_counter += indx + len(word)
     return retrieve_occurences(sequence[indx + len(word):], word, result, base_counter)
blasrodri answered 2019-10-06T14:16:48Z
0 votes

该链接说明了如何在O(n)中完成整个操作,并且还包括python中的解决方案。

如果您深入到“后缀树”的集合,那么如果您有一个大字符串但想要在其中搜索上千个模式,则可以执行相同的操作。

Abhishek Jebaraj answered 2019-10-06T14:17:27Z
0 votes

我认为无需测试文本的长度; 继续寻找,直到找不到任何东西为止。 像这样:

    >>> text = 'Allowed Hello Hollow'
    >>> place = 0
    >>> while text.find('ll', place) != -1:
            print('ll found at', text.find('ll', place))
            place = text.find('ll', place) + 2


    ll found at 1
    ll found at 10
    ll found at 16
rdo answered 2019-10-06T14:17:55Z
0 votes

您也可以使用条件列表理解来做到这一点:

string1= "Allowed Hello Hollow"
string2= "ll"
print [num for num in xrange(len(string1)-len(string2)+1) if string1[num:num+len(string2)]==string2]
# [1, 10, 16]
Stefan Gruenwald answered 2019-10-06T14:18:26Z
0 votes

不久前,我随机得到了这个主意。 即使字符串重叠,也可以将While循环与字符串拼接和字符串搜索一起使用。

findin = "algorithm alma mater alison alternation alpines"
search = "al"
inx = 0
num_str = 0

while True:
    inx = findin.find(search)
    if inx == -1: #breaks before adding 1 to number of string
        break
    inx = inx + 1
    findin = findin[inx:] #to splice the 'unsearched' part of the string
    num_str = num_str + 1 #counts no. of string

if num_str != 0:
    print("There are ",num_str," ",search," in your string.")
else:
    print("There are no ",search," in your string.")

我是Python编程(实际上是任何语言的编程)的业余爱好者,并且不确定它可能还有其他问题,但是我想它能正常工作吗?

我猜lower()也可以在其中使用。

Mystearica Primal Fende answered 2019-10-06T14:19:14Z
translate from https://stackoverflow.com:/questions/3873361/finding-multiple-occurrences-of-a-string-within-a-string-in-python